From solegalli at protonmail.com Mon Jan 4 12:56:52 2021 From: solegalli at protonmail.com (Sole Galli) Date: Mon, 04 Jan 2021 17:56:52 +0000 Subject: [scikit-learn] IterativeImputer Message-ID: Hello team, I am reading in some of the MICE original articles that supposedly, each variable should be modelled upon the other ones in the data, with a suitable model. So for example, if the variable with NA is binary, it should be modelled with classification, or if continuous with a regression model. Am I correct to understand that this is not possible yet with the IterativeImputer? because I should set the estimator in the estimator parameter and that will be used for all variables. Is there a workaround? Thanks a lot! Regards Soledad Galli https://www.trainindata.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Tue Jan 5 03:34:27 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 5 Jan 2021 09:34:27 +0100 Subject: [scikit-learn] Comparing Scikit and Xlstat for PCA analysis In-Reply-To: References: Message-ID: Yes: *svd_solver*{?auto?, ?full?, ?arpack?, ?randomized?}, default=?auto?If auto : The solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ?randomized? method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards. If full : run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing If arpack : run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape) If randomized : run randomized SVD by the method of Halko et al. New in version 0.18.0. On Mon, 28 Dec 2020 at 17:54, Mahmood Naderan wrote: > Hi Guillaume, > Thanks for the reply. May I know if I can choose different solvers in the > scikit package or not. > > Regards, > Mahmood > > > > > On Mon, Dec 28, 2020 at 4:30 PM Guillaume Lema?tre > wrote: > >> n_components set to 'auto' is a strategy that will pick the number of >> components. The sign of the PC does not matter so much since they are still >> orthogonal. So change will depend of the solver that should be different in >> both software. >> >> >> >> >> Sent from my phone - sorry to be brief and potential misspell. >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From glennmschultz at me.com Tue Jan 5 12:24:53 2021 From: glennmschultz at me.com (Glenn Schultz) Date: Tue, 5 Jan 2021 11:24:53 -0600 Subject: [scikit-learn] extraction of grid search values Message-ID: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com> All, I have a grid search of gradient boosting classifier. All works well the best model is extracted and predict works on the model. I would like to extract the cv_results_ My set-up is pretty standard gbclassifier = GridSearchCV(GradientBoostingClassifier(), parameters, verbose = 5, n_jobs = 5, cv = ShuffleSplit(n_splits = 5, test_size = .2, random_state = 42), refit = True, scoring = ?roc_auc?) print(gbclassifier.cv_results_) returns an attribute error ?Gradient Boosting Classifier? has no attribute cv_results. I am not sure what I am doing wrong I checked the documentation and followed some SO examples but no progress. I am missing something any help is appreciated. Best, Glenn From niourf at gmail.com Tue Jan 5 15:58:56 2021 From: niourf at gmail.com (Nicolas Hug) Date: Tue, 5 Jan 2021 20:58:56 +0000 Subject: [scikit-learn] extraction of grid search values In-Reply-To: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com> References: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com> Message-ID: Glenn, You need to fit the estimator with some data for the cv_results_ attribute to exist. You may refer to https://scikit-learn.org/stable/getting_started.html Nicolas On Tue, 5 Jan 2021 at 17:25, Glenn Schultz via scikit-learn < scikit-learn at python.org> wrote: > All, > > I have a grid search of gradient boosting classifier. All works well the > best model is extracted and predict works on the model. I would like to > extract the cv_results_ My set-up is pretty standard > > gbclassifier = GridSearchCV(GradientBoostingClassifier(), > parameters, > verbose = 5, > n_jobs = 5, > cv = ShuffleSplit(n_splits = 5, test_size = .2, > random_state = 42), > refit = True, > scoring = ?roc_auc?) > > print(gbclassifier.cv_results_) > > returns an attribute error ?Gradient Boosting Classifier? has no attribute > cv_results. I am not sure what I am doing wrong I checked the > documentation and followed some SO examples but no progress. I am missing > something any help is appreciated. > > Best, > Glenn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From icefrog1950 at gmail.com Wed Jan 6 06:00:46 2021 From: icefrog1950 at gmail.com (Liu James) Date: Wed, 6 Jan 2021 19:00:46 +0800 Subject: [scikit-learn] 2 million samples dataset caused python and OS crash Message-ID: Hi all, I'm using a medium dataset KDD99 IDS( https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) for model training, and the dataset has 2 million samples. When using fit_transform(), the OS crashed with log "Process 13851(python) of user xxx dumped core. Stack trace .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited. Such crash can be reproduced. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Wed Jan 6 06:46:36 2021 From: ahowe42 at gmail.com (Andrew Howe) Date: Wed, 6 Jan 2021 11:46:36 +0000 Subject: [scikit-learn] 2 million samples dataset caused python and OS crash In-Reply-To: References: Message-ID: A core dump generally happens when a process tries to access memory outside it's allocated address space. You've not specified what estimator you were using, but I'd guess it attempted to do something with the dataset that resulted in it being duplicated or otherwise expanded beyond the memory capacity. Perhaps the full stack trace would be helpful. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Wed, Jan 6, 2021 at 11:02 AM Liu James wrote: > Hi all, > > I'm using a medium dataset KDD99 IDS( > https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) > for model training, and the dataset has 2 million samples. When using > fit_transform(), the OS crashed with log "Process 13851(python) of user xxx > dumped core. Stack trace > .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". > > The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited. > Such crash can be reproduced. > > Thanks. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Wed Jan 6 09:31:38 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Wed, 6 Jan 2021 15:31:38 +0100 Subject: [scikit-learn] 2 million samples dataset caused python and OS crash In-Reply-To: References: Message-ID: And it seems that the piece of traceback refer to NumPy. On Wed, 6 Jan 2021 at 12:48, Andrew Howe wrote: > A core dump generally happens when a process tries to access memory > outside it's allocated address space. You've not specified what estimator > you were using, but I'd guess it attempted to do something with the dataset > that resulted in it being duplicated or otherwise expanded beyond the > memory capacity. Perhaps the full stack trace would be helpful. > > Andrew > > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > LinkedIn Profile > ResearchGate Profile > Open Researcher and Contributor ID (ORCID) > > Github Profile > Personal Website > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > > On Wed, Jan 6, 2021 at 11:02 AM Liu James wrote: > >> Hi all, >> >> I'm using a medium dataset KDD99 IDS( >> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) >> for model training, and the dataset has 2 million samples. When using >> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx >> dumped core. Stack trace >> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". >> >> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set >> unlimited. Such crash can be reproduced. >> >> Thanks. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From icefrog1950 at gmail.com Fri Jan 8 00:33:34 2021 From: icefrog1950 at gmail.com (Liu James) Date: Fri, 8 Jan 2021 13:33:34 +0800 Subject: [scikit-learn] 2 million samples dataset caused python and OS crash In-Reply-To: References: Message-ID: Thanks for reply. I tested different size of data on different distros ,and found when data is over 500 thousand rows (with 50 columns), the crash will happened with same error message -- kernel page error. Guillaume Lema?tre ?2021?1?6??? ??10:33??? > And it seems that the piece of traceback refer to NumPy. > > On Wed, 6 Jan 2021 at 12:48, Andrew Howe wrote: > >> A core dump generally happens when a process tries to access memory >> outside it's allocated address space. You've not specified what estimator >> you were using, but I'd guess it attempted to do something with the dataset >> that resulted in it being duplicated or otherwise expanded beyond the >> memory capacity. Perhaps the full stack trace would be helpful. >> >> Andrew >> >> >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> J. Andrew Howe, PhD >> LinkedIn Profile >> ResearchGate Profile >> Open Researcher and Contributor ID (ORCID) >> >> Github Profile >> Personal Website >> I live to learn, so I can learn to live. - me >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >> >> >> On Wed, Jan 6, 2021 at 11:02 AM Liu James wrote: >> >>> Hi all, >>> >>> I'm using a medium dataset KDD99 IDS( >>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) >>> for model training, and the dataset has 2 million samples. When using >>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx >>> dumped core. Stack trace >>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". >>> >>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set >>> unlimited. Such crash can be reproduced. >>> >>> Thanks. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahowe42 at gmail.com Fri Jan 8 04:09:54 2021 From: ahowe42 at gmail.com (Andrew Howe) Date: Fri, 8 Jan 2021 09:09:54 +0000 Subject: [scikit-learn] 2 million samples dataset caused python and OS crash In-Reply-To: References: Message-ID: Doesn't seem like a sklearn issue, but an OS / hardware issue. Again, a full stack trace would be useful information. Either way, you can try training on a sample or via cross-validation. I believe some estimators can also use incremental training. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile ResearchGate Profile Open Researcher and Contributor ID (ORCID) Github Profile Personal Website I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Fri, Jan 8, 2021 at 5:35 AM Liu James wrote: > Thanks for reply. I tested different size of data on different distros > ,and found when data is over 500 thousand rows (with 50 columns), the crash > will happened with same error message -- kernel page error. > > Guillaume Lema?tre ?2021?1?6??? ??10:33??? > >> And it seems that the piece of traceback refer to NumPy. >> >> On Wed, 6 Jan 2021 at 12:48, Andrew Howe wrote: >> >>> A core dump generally happens when a process tries to access memory >>> outside it's allocated address space. You've not specified what estimator >>> you were using, but I'd guess it attempted to do something with the dataset >>> that resulted in it being duplicated or otherwise expanded beyond the >>> memory capacity. Perhaps the full stack trace would be helpful. >>> >>> Andrew >>> >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile >>> ResearchGate Profile >>> Open Researcher and Contributor ID (ORCID) >>> >>> Github Profile >>> Personal Website >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Jan 6, 2021 at 11:02 AM Liu James wrote: >>> >>>> Hi all, >>>> >>>> I'm using a medium dataset KDD99 IDS( >>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) >>>> for model training, and the dataset has 2 million samples. When using >>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx >>>> dumped core. Stack trace >>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". >>>> >>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set >>>> unlimited. Such crash can be reproduced. >>>> >>>> Thanks. >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reshama.stat at gmail.com Thu Jan 14 07:57:40 2021 From: reshama.stat at gmail.com (Reshama Shaikh) Date: Thu, 14 Jan 2021 07:57:40 -0500 Subject: [scikit-learn] [Data Umbrella AFME sprint] share Message-ID: Hello, There is an upcoming scikit-learn open source sprint to increase participation of folks in the **Africa and Middle East** regions. If you are located in **Africa and Middle East**, or have contacts there, please share: Data Umbrella has organized a scikit-learn open source sprint for 06-Feb-2021, with a focus on **Africa and Middle East** regions. A sprint is a 4-hour online hackathon where data scientists / developers will work with a pair programming partner on a beginner-friendly issue in the scikit-learn repo. Some knowledge of python, scikit-learn and machine learning is required. This sprint is an excellent opportunity to increase machine learning and python skills, get mentorship from core developers of the library and get started in contributing to open source. Full details are available here: https://afme2021.dataumbrella.org Also, here are social media links that can be shared: - Twitter [a] - LinkedIn [b] - Facebook [c] [a] https://twitter.com/DataUmbrella/status/1346486322958131202 [b] https://www.linkedin.com/feed/update/urn:li:activity:6752255120714579968/ [c] https://www.facebook.com/data.umbrella.dei/photos/a.156775909179975/432596991597864/ Application deadline: 22-January-2021 We are happy to answer any questions. They can be sent to: data.umbrella.dei at gmail.com Best, Reshama --- Reshama Shaikh she/her Blog | Twitter | LinkedIn | GitHub Data Umbrella NYC PyLadies -------------- next part -------------- An HTML attachment was scrubbed... URL: From matematica.a3k at gmail.com Fri Jan 15 19:41:50 2021 From: matematica.a3k at gmail.com (=?UTF-8?Q?Matem=C3=A1tica_A3K?=) Date: Fri, 15 Jan 2021 19:41:50 -0500 Subject: [scikit-learn] [ANN] The covid-ht project Message-ID: >From https://covid-ht.herokuapp.com/about: According to Dr. Eugenia Barrientos[1], an ongoing viral infection can be detected from the results of an hemogram test, and, given the current COVID19 pandemic, all viral infections with cold and flu symptoms should be treated as COVID19 cases. The inference from the hemogram test results is done based on the knowledge and experience of the Health Professional. If that process could be automatized and widely available, the detection toolkit of Health Professionals will be improved. In many places (i.e. Per?) where specific COVID19 testing is not widely available - saturated hospitals, not affordable or unavailable - hemogram blood testing is in the opposite: affordable and in widely distributed facilities. If a viral infection classifier with an adequate accuracy through hemograms can be built and made publicly available, all Health Professionals with a smart-phone and Internet access could classify any hemogram with the same accuracy as top-level experts on the matter. Early detection is deemed to be the greatest success factor in COVID19 treatments. This project aims to provide a tool to efficiently build and manage that classifier and make it effectively available for widespread use in order to improve detection and increase the use efficiency of specific testing of COVID19. This tool is totally transparent: you may audit it entirely to fully understand how it works, what it provides and its limitations. It is distributed under the GNU LGPLv3 license. Improvements in early detection should increase successful treatments, potentially saving lives. Better resource efficiency can also be achieved with the tool, i.e. only use expensive specific COVID19 testing for recovery after the hemogram does not indicate infection. The tool is not a replacement of Health Professionals. Any diagnostic and treatment should be decided by a Health Professional with the patient. If you are an individual with a recent hemogram result, the tool may indicate to take preemptive care and seek a Health Professional. Also, don't blame the knife providers: This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Everybody is welcome to join the community for building it and use it: covid-ht+subscribe at googlegroups.com and https://github.com/math-a3k/covid-ht . Made with love for all humans of the world. [1] https://youtu.be/ZO6EaAz465Y?t=570 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-marc.mercier at mpg-partners.com Sat Jan 16 08:01:44 2021 From: jean-marc.mercier at mpg-partners.com (Jean-Marc MERCIER) Date: Sat, 16 Jan 2021 14:01:44 +0100 Subject: [scikit-learn] An alternative project to scikit-learn for support vector machine learning tools ? Message-ID: Hello, and congratulations for the very nice work done at scikit-learn ! I would like to point out an alternative initiative to SVM tools for machine learning than scikit-learn. We are trying to kick-off it, see for instance this link here . Indeed, as practitioners from the private research sector, we felt the need to craft an alternative approach to SVM learning tools some years ago. We are using this approach today for industrial applications, it revealed itself quite solid and robust. I thought that this initiative might interest the scikit-learn community. Above curiosity, there might be some interests to discuss together : might our ideas be interesting for your community ? Can they be merged together ? So I would like to identify the correct persons at scikit-learn to discuss these matters. Could someone help me there to identify who is in charge of this project to enter a dialog with him ? -- *Jean-Marc Mercier* Senior Research Advisor 136 boulevard Haussmann 75008 Paris Tel +33 1 53 05 98 52 GSM +33 6 77 64 06 85 www.mpg-partners.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrin.jalali at gmail.com Sat Jan 16 12:44:09 2021 From: adrin.jalali at gmail.com (Adrin) Date: Sat, 16 Jan 2021 18:44:09 +0100 Subject: [scikit-learn] Renaming the default branch to `main` Message-ID: GitHub now supports renaming the default branch with this done automatically: Renaming a branch will: - Re-target any open pull requests - Update any draft releases based on the branch - Move any branch protection rules that explicitly reference the old name - Update the branch used to build GitHub Pages, if applicable - Show a notice to repository contributors, maintainers, and admins on the repository homepage with instructions to update local copies of the repository - Show a notice to contributors who git push to the old branch - Redirect web requests for the old branch name to the new branch name - Return a "Moved Permanently" response in API requests for the old branch name We have talked in this issue about renaming the branch, but since this is a major change, hence this email to engage and inform the broader community. Cheers, Adrin -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Tue Jan 19 13:16:55 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Tue, 19 Jan 2021 19:16:55 +0100 Subject: [scikit-learn] [ANN] scikit-learn 0.24.1 is online! Message-ID: scikit-learn 0.24.1 is out on pypi.org and conda-forge! This is a small maintenance release that fixes the macOS wheels and small bugs in SelfTrainingClassifier and adjusted_mutual_info_score: https://scikit-learn.org/stable/whats_new/v0.24.html#version-0-24-1 You can upgrade with pip as usual: pip install -U scikit-learn The conda-forge builds will be available shortly, which you can then install using: conda install -c conda-forge scikit-learn Thanks again to all the contributors! On behalf of the scikit-learn maintainer team. -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bertrand25mtl at gmail.com Wed Jan 20 17:19:40 2021 From: bertrand25mtl at gmail.com (Bertrand B.) Date: Wed, 20 Jan 2021 17:19:40 -0500 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' Message-ID: To whom it may concern, I am trying to install scikit-learn in a PySpark job using the install_pypi_package PySpark API but the install fails with : sc.install_pypi_package("scikit-learn") Collecting scikit-learn Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting scipy>=0.19.1 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting threadpoolctl>=2.0.0 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl Building wheels for collected packages: scikit-learn Running setup.py bdist_wheel for scikit-learn: started Running setup.py bdist_wheel for scikit-learn: finished with status 'error' Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: Partial import of sklearn during the build process. Traceback (most recent call last): File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 201, in check_package_status module = importlib.import_module(package) File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'scipy' Traceback (most recent call last): File "", line 1, in File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 306, in setup_package() File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 294, in setup_package check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line 227, in check_package_status .format(package, req_str, instructions)) ImportError: scipy is not installed. scikit-learn requires scipy >= 0.19.1. I do not encounter this error with scikit-learn 0.23.2 : sc.install_pypi_package("scikit-learn==0.23.2") Collecting scikit-learn==0.23.2 Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Installing collected packages: scikit-learn Successfully installed scikit-learn-0.23.2 Could you please help me understand why the scikit-learn 0.24 installation fails ? Thank you for your help, Bertrand -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Wed Jan 20 18:16:05 2021 From: g.lemaitre58 at gmail.com (=?ISO-8859-1?Q?Guillaume_Lema=EEtre?=) Date: Thu, 21 Jan 2021 00:16:05 +0100 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: Message-ID: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> An HTML attachment was scrubbed... URL: From helmrp at yahoo.com Wed Jan 20 18:32:13 2021 From: helmrp at yahoo.com (The Helmbolds) Date: Wed, 20 Jan 2021 23:32:13 +0000 (UTC) Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> Message-ID: <2130832503.2194460.1611185533518@mail.yahoo.com> Use the Anaconda Python installation. "You won't find the right answers if you don't ask the right questions!" (Robert Helmbold, 2013) On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre wrote: #yiv4846675950 #yiv4846675950response_container_BBPPID{font-family:initial;font-size:initial;color:initial;} Basically it get the tar with the source and recompile instead of using the wheel. Could you force an install from PyPI without using the cached file.? We pushed wheels yesterday for 0.24.1 as well so it should not get the 0.24.0 version.? For 0.23.2, you can see that it used the wheel (.whl).? Sent from my phone - sorry to be brief and potential misspell. | From: bertrand25mtl at gmail.comSent: 20 January 2021 23:21To: scikit-learn at python.orgReply to: scikit-learn at python.orgSubject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' | To whom it may concern, I am trying to install scikit-learn in a PySpark job using the install_pypi_package PySpark API but the install fails with :? sc.install_pypi_package("scikit-learn") Collecting scikit-learn Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting scipy>=0.19.1 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting threadpoolctl>=2.0.0 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl Building wheels for collected packages: scikit-learn ? Running setup.py bdist_wheelfor scikit-learn: started ? Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: Partial import of sklearn during the build process. Traceback (most recent call last): File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status module = importlib.import_module(package) File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'scipy' Traceback (most recent call last): File "", line 1, in File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in setup_package() File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status .format(package, req_str, instructions)) ImportError: scipy is not installed. scikit-learn requires scipy >= 0.19.1. I do not encounter this error with scikit-learn 0.23.2 : sc.install_pypi_package("scikit-learn==0.23.2") Collecting scikit-learn==0.23.2 Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) Installing collected packages: scikit-learn Successfully installed scikit-learn-0.23.2? Could you please help me understand why the scikit-learn 0.24 installation fails ? Thank you for your help, Bertrand_______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Thu Jan 21 03:24:37 2021 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Thu, 21 Jan 2021 09:24:37 +0100 Subject: [scikit-learn] Monthly meeting January 25th 2021 Message-ID: Dear list, The scikit-learn monthly meeting will take place on Monday January 25th at 8PM UTC: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=25&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224 While these meetings are mainly for core-devs to discuss the current topics, we are also happy to welcome non-core devs and other project maintainers. Feel free to join, using the following link: https://meet.google.com/xhq-yoga-rtf If you plan to attend and you would like to discuss something specific about your contribution please add your name (or github pseudo) in the " Contributors " section, of the public pad: https://hackmd.io/qVZD8baKRce3uYpto11z0w Best Chiara -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri Jan 22 03:49:08 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 22 Jan 2021 09:49:08 +0100 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: <2130832503.2194460.1611185533518@mail.yahoo.com> References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> <2130832503.2194460.1611185533518@mail.yahoo.com> Message-ID: We might experience an issue with PyPI not selecting the manylinux2010 wheel: https://github.com/scikit-learn/scikit-learn/issues/19233 We have to check but we will probably shortly upload manylinux1 wheels that should resolve the issue. I am curious if fetching the wheel by hand and installing via `pip` would be a workaround (not practical for automated usage thought). On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn < scikit-learn at python.org> wrote: > Use the Anaconda Python installation. > > "You won't find the right answers if you don't ask the right questions!" > (Robert Helmbold, 2013) > > > On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre < > g.lemaitre58 at gmail.com> wrote: > > > Basically it get the tar with the source and recompile instead of using > the wheel. Could you force an install from PyPI without using the cached > file. > > We pushed wheels yesterday for 0.24.1 as well so it should not get the > 0.24.0 version. > > For 0.23.2, you can see that it used the wheel (.whl). > > Sent from my phone - sorry to be brief and potential misspell. > *From:* bertrand25mtl at gmail.com > *Sent:* 20 January 2021 23:21 > *To:* scikit-learn at python.org > *Reply to:* scikit-learn at python.org > *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with > ModuleNotFoundError: No module named 'scipy' > > To whom it may concern, > > I am trying to install scikit-learn in a PySpark job using the > install_pypi_package PySpark API but the install fails with : > > sc.install_pypi_package("scikit-learn") > > Collecting scikit-learn > Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz > Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) > Collecting scipy>=0.19.1 (from scikit-learn) > Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl > Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) > Collecting threadpoolctl>=2.0.0 (from scikit-learn) > Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl > Building wheels for collected packages: scikit-learn > Running setup.py bdist_wheelfor scikit-learn: started > Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' > Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: > Partial import of sklearn during the build process. > Traceback (most recent call last): > File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status > module = importlib.import_module(package) > File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module > return _bootstrap._gcd_import(name[level:], package, level) > File "", line 1006, in _gcd_import > File "", line 983, in _find_and_load > File "", line 965, in _find_and_load_unlocked > ModuleNotFoundError: No module named 'scipy' > Traceback (most recent call last): > File "", line 1, in > File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in > setup_package() > File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package > check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) > File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status > .format(package, req_str, instructions)) > ImportError: scipy is not installed. > scikit-learn requires scipy >= 0.19.1. > > I do not encounter this error with scikit-learn 0.23.2 : > > sc.install_pypi_package("scikit-learn==0.23.2") > > Collecting scikit-learn==0.23.2 > Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl > Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) > Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) > Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) > Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) > Installing collected packages: scikit-learn > Successfully installed scikit-learn-0.23.2 > > > Could you please help me understand why the scikit-learn 0.24 installation > fails ? > > Thank you for your help, > > Bertrand > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri Jan 22 04:11:43 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 22 Jan 2021 10:11:43 +0100 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> <2130832503.2194460.1611185533518@mail.yahoo.com> Message-ID: @Bertrand Could you tell us which version of `pip` to you use (you need pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014) On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre wrote: > We might experience an issue with PyPI not selecting the manylinux2010 > wheel: https://github.com/scikit-learn/scikit-learn/issues/19233 > We have to check but we will probably shortly upload manylinux1 wheels > that should resolve the issue. > > I am curious if fetching the wheel by hand and installing via `pip` would > be a workaround (not practical for automated usage thought). > > On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn < > scikit-learn at python.org> wrote: > >> Use the Anaconda Python installation. >> >> "You won't find the right answers if you don't ask the right questions!" >> (Robert Helmbold, 2013) >> >> >> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre < >> g.lemaitre58 at gmail.com> wrote: >> >> >> Basically it get the tar with the source and recompile instead of using >> the wheel. Could you force an install from PyPI without using the cached >> file. >> >> We pushed wheels yesterday for 0.24.1 as well so it should not get the >> 0.24.0 version. >> >> For 0.23.2, you can see that it used the wheel (.whl). >> >> Sent from my phone - sorry to be brief and potential misspell. >> *From:* bertrand25mtl at gmail.com >> *Sent:* 20 January 2021 23:21 >> *To:* scikit-learn at python.org >> *Reply to:* scikit-learn at python.org >> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with >> ModuleNotFoundError: No module named 'scipy' >> >> To whom it may concern, >> >> I am trying to install scikit-learn in a PySpark job using the >> install_pypi_package PySpark API but the install fails with : >> >> sc.install_pypi_package("scikit-learn") >> >> Collecting scikit-learn >> Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz >> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >> Collecting scipy>=0.19.1 (from scikit-learn) >> Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl >> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >> Collecting threadpoolctl>=2.0.0 (from scikit-learn) >> Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl >> Building wheels for collected packages: scikit-learn >> Running setup.py bdist_wheelfor scikit-learn: started >> Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' >> Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: >> Partial import of sklearn during the build process. >> Traceback (most recent call last): >> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status >> module = importlib.import_module(package) >> File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module >> return _bootstrap._gcd_import(name[level:], package, level) >> File "", line 1006, in _gcd_import >> File "", line 983, in _find_and_load >> File "", line 965, in _find_and_load_unlocked >> ModuleNotFoundError: No module named 'scipy' >> Traceback (most recent call last): >> File "", line 1, in >> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in >> setup_package() >> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package >> check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) >> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status >> .format(package, req_str, instructions)) >> ImportError: scipy is not installed. >> scikit-learn requires scipy >= 0.19.1. >> >> I do not encounter this error with scikit-learn 0.23.2 : >> >> sc.install_pypi_package("scikit-learn==0.23.2") >> >> Collecting scikit-learn==0.23.2 >> Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl >> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >> Installing collected packages: scikit-learn >> Successfully installed scikit-learn-0.23.2 >> >> >> Could you please help me understand why the scikit-learn 0.24 >> installation fails ? >> >> Thank you for your help, >> >> Bertrand >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Fri Jan 22 04:13:15 2021 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Fri, 22 Jan 2021 10:13:15 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable Message-ID: Hi I have a question about PCA and that is, how we can determine, a variable, X, is better captured by which factor (principal component)? For example, maybe one variable has low weight in the first PC but has a higher weight in the fifth PC. When I use the PCA from Scikit, I have to manually work with the PCs, therefore, I may miss the point that although a variable is weak in PC1-PC2 plot, it may be strong in PC4-PC5 plot. Any comment on that? Regards, Mahmood From g.lemaitre58 at gmail.com Fri Jan 22 04:25:54 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 22 Jan 2021 10:25:54 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: References: Message-ID: I am not really understanding the question, sorry. Are you seeking for the `explained_variance_ratio_` attribute that give you a relative value of the eigenvalues associated to the eigenvectors? On Fri, 22 Jan 2021 at 10:16, Mahmood Naderan wrote: > Hi > I have a question about PCA and that is, how we can determine, a > variable, X, is better captured by which factor (principal > component)? For example, maybe one variable has low weight in the > first PC but has a higher weight in the fifth PC. > > When I use the PCA from Scikit, I have to manually work with the PCs, > therefore, I may miss the point that although a variable is weak in > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > Any comment on that? > > Regards, > Mahmood > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From julio at esbet.es Fri Jan 22 05:17:22 2021 From: julio at esbet.es (Julio Antonio Soto) Date: Fri, 22 Jan 2021 11:17:22 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: References: Message-ID: Hi Mahmood, I believe your question is answered here: https://stackoverflow.com/questions/22984335/recovering-features-names-of-explained-variance-ratio-in-pca-with-sklearn > El 22 ene 2021, a las 10:26, Guillaume Lema?tre escribi?: > > ? > I am not really understanding the question, sorry. > Are you seeking for the `explained_variance_ratio_` attribute that give you a relative value of the eigenvalues associated to the eigenvectors? > >> On Fri, 22 Jan 2021 at 10:16, Mahmood Naderan wrote: >> Hi >> I have a question about PCA and that is, how we can determine, a >> variable, X, is better captured by which factor (principal >> component)? For example, maybe one variable has low weight in the >> first PC but has a higher weight in the fifth PC. >> >> When I use the PCA from Scikit, I have to manually work with the PCs, >> therefore, I may miss the point that although a variable is weak in >> PC1-PC2 plot, it may be strong in PC4-PC5 plot. >> >> Any comment on that? >> >> Regards, >> Mahmood >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Fri Jan 22 05:50:30 2021 From: niourf at gmail.com (Nicolas Hug) Date: Fri, 22 Jan 2021 10:50:30 +0000 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: References: Message-ID: Hi Mahmood, There are different pieces of info that you can get from PCA: 1. How important is a given PC to reconstruct the entire dataset -> This is given by explained_variance_ratio_ as Guillaume suggested 2. What is the contribution of each feature to each PC (remember that a PC is a linear combination of all the features i.e.: PC_1 = X_1 . alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what you're looking for and they are given in the components_ matrix which is a n_components x n_features matrix. Nicolas On 1/22/21 9:13 AM, Mahmood Naderan wrote: > Hi > I have a question about PCA and that is, how we can determine, a > variable, X, is better captured by which factor (principal > component)? For example, maybe one variable has low weight in the > first PC but has a higher weight in the fifth PC. > > When I use the PCA from Scikit, I have to manually work with the PCs, > therefore, I may miss the point that although a variable is weak in > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > Any comment on that? > > Regards, > Mahmood > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From bertrand25mtl at gmail.com Fri Jan 22 08:37:21 2021 From: bertrand25mtl at gmail.com (Bertrand B.) Date: Fri, 22 Jan 2021 08:37:21 -0500 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> <2130832503.2194460.1611185533518@mail.yahoo.com> Message-ID: Thank you Guillaume for your help, I am using : (running on AWS EMR-6.2) pip3 --version pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7) pip3 install scikit-learn Collecting scikit-learn Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Collecting scipy>=0.19.1 (from scikit-learn) Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn) Installing collected packages: scipy, scikit-learn Running setup.py install for scikit-learn ... error Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0ulalx36-record/install-record.txt --single-version-externally-managed --compile: Partial import of sklearn during the build process. Traceback (most recent call last): File "/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py", line 27, in _check_cython_version import Cython ModuleNotFoundError: No module named 'Cython' Upgrading pip to 20.3.3 : sudo pip3 install --upgrade pip sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3 pip3 --version pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7) let me install from the whl file : pip3 install scikit-learn Collecting scikit-learn Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB) However, using the API sc.install_pypi_package("scikit-learn") still uses the tar file instead of the whl file (even after the pip upgrade). Collecting scikit-learn Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz Thanks for your help, Cheers, Bertrand Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre a ?crit : > @Bertrand Could you tell us which version of `pip` to you use (you need > pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014) > > On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre > wrote: > >> We might experience an issue with PyPI not selecting the manylinux2010 >> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233 >> We have to check but we will probably shortly upload manylinux1 wheels >> that should resolve the issue. >> >> I am curious if fetching the wheel by hand and installing via `pip` would >> be a workaround (not practical for automated usage thought). >> >> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn < >> scikit-learn at python.org> wrote: >> >>> Use the Anaconda Python installation. >>> >>> "You won't find the right answers if you don't ask the right questions!" >>> (Robert Helmbold, 2013) >>> >>> >>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre < >>> g.lemaitre58 at gmail.com> wrote: >>> >>> >>> Basically it get the tar with the source and recompile instead of using >>> the wheel. Could you force an install from PyPI without using the cached >>> file. >>> >>> We pushed wheels yesterday for 0.24.1 as well so it should not get the >>> 0.24.0 version. >>> >>> For 0.23.2, you can see that it used the wheel (.whl). >>> >>> Sent from my phone - sorry to be brief and potential misspell. >>> *From:* bertrand25mtl at gmail.com >>> *Sent:* 20 January 2021 23:21 >>> *To:* scikit-learn at python.org >>> *Reply to:* scikit-learn at python.org >>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with >>> ModuleNotFoundError: No module named 'scipy' >>> >>> To whom it may concern, >>> >>> I am trying to install scikit-learn in a PySpark job using the >>> install_pypi_package PySpark API but the install fails with : >>> >>> sc.install_pypi_package("scikit-learn") >>> >>> Collecting scikit-learn >>> Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz >>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>> Collecting scipy>=0.19.1 (from scikit-learn) >>> Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl >>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>> Collecting threadpoolctl>=2.0.0 (from scikit-learn) >>> Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl >>> Building wheels for collected packages: scikit-learn >>> Running setup.py bdist_wheelfor scikit-learn: started >>> Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' >>> Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: >>> Partial import of sklearn during the build process. >>> Traceback (most recent call last): >>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status >>> module = importlib.import_module(package) >>> File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module >>> return _bootstrap._gcd_import(name[level:], package, level) >>> File "", line 1006, in _gcd_import >>> File "", line 983, in _find_and_load >>> File "", line 965, in _find_and_load_unlocked >>> ModuleNotFoundError: No module named 'scipy' >>> Traceback (most recent call last): >>> File "", line 1, in >>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in >>> setup_package() >>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package >>> check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) >>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status >>> .format(package, req_str, instructions)) >>> ImportError: scipy is not installed. >>> scikit-learn requires scipy >= 0.19.1. >>> >>> I do not encounter this error with scikit-learn 0.23.2 : >>> >>> sc.install_pypi_package("scikit-learn==0.23.2") >>> >>> Collecting scikit-learn==0.23.2 >>> Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl >>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>> Installing collected packages: scikit-learn >>> Successfully installed scikit-learn-0.23.2 >>> >>> >>> Could you please help me understand why the scikit-learn 0.24 >>> installation fails ? >>> >>> Thank you for your help, >>> >>> Bertrand >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Fri Jan 22 09:04:32 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Fri, 22 Jan 2021 15:04:32 +0100 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> <2130832503.2194460.1611185533518@mail.yahoo.com> Message-ID: OK, so the normal install is working. Now, to fix your issue we need to understand how `sc.install_pypi_package` is working and mainly how does it call `pip`. We need to make sure that it call the right pip (the system `pip3` in your case). On Fri, 22 Jan 2021 at 14:39, Bertrand B. wrote: > Thank you Guillaume for your help, > > I am using : (running on AWS EMR-6.2) > pip3 --version > pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7) > > > pip3 install scikit-learn > > Collecting scikit-learn > Using cached > https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz > Requirement already satisfied: numpy>=1.13.3 in > /usr/local/lib64/python3.7/site-packages (from scikit-learn) > Collecting scipy>=0.19.1 (from scikit-learn) > Using cached > https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl > Requirement already satisfied: joblib>=0.11 in > /usr/local/lib64/python3.7/site-packages (from scikit-learn) > Requirement already satisfied: threadpoolctl>=2.0.0 in > /usr/local/lib/python3.7/site-packages (from scikit-learn) > Installing collected packages: scipy, scikit-learn > Running setup.py install for scikit-learn ... error > Complete output from command /usr/bin/python3 -u -c "import > setuptools, > tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize, > 'open', open)(__file__);code=f.read().replace('\r\n', > '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record > /tmp/pip-0ulalx36-record/install-record.txt > --single-version-externally-managed --compile: > Partial import of sklearn during the build process. > Traceback (most recent call last): > File > "/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py", > line 27, in _check_cython_version > import Cython > ModuleNotFoundError: No module named 'Cython' > > > Upgrading pip to 20.3.3 : > > sudo pip3 install --upgrade pip > sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3 > > pip3 --version > pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7) > > let me install from the whl file : > pip3 install scikit-learn > Collecting scikit-learn > Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl > (22.3 MB) > > However, using the API sc.install_pypi_package("scikit-learn") still uses > the tar file instead of the whl file (even after the pip upgrade). > > Collecting scikit-learn > Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz > > > Thanks for your help, > > Cheers, > > Bertrand > > Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre > a ?crit : > >> @Bertrand Could you tell us which version of `pip` to you use (you need >> pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014) >> >> On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre >> wrote: >> >>> We might experience an issue with PyPI not selecting the manylinux2010 >>> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233 >>> We have to check but we will probably shortly upload manylinux1 wheels >>> that should resolve the issue. >>> >>> I am curious if fetching the wheel by hand and installing via `pip` >>> would be a workaround (not practical for automated usage thought). >>> >>> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn < >>> scikit-learn at python.org> wrote: >>> >>>> Use the Anaconda Python installation. >>>> >>>> "You won't find the right answers if you don't ask the right >>>> questions!" (Robert Helmbold, 2013) >>>> >>>> >>>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre < >>>> g.lemaitre58 at gmail.com> wrote: >>>> >>>> >>>> Basically it get the tar with the source and recompile instead of using >>>> the wheel. Could you force an install from PyPI without using the cached >>>> file. >>>> >>>> We pushed wheels yesterday for 0.24.1 as well so it should not get the >>>> 0.24.0 version. >>>> >>>> For 0.23.2, you can see that it used the wheel (.whl). >>>> >>>> Sent from my phone - sorry to be brief and potential misspell. >>>> *From:* bertrand25mtl at gmail.com >>>> *Sent:* 20 January 2021 23:21 >>>> *To:* scikit-learn at python.org >>>> *Reply to:* scikit-learn at python.org >>>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with >>>> ModuleNotFoundError: No module named 'scipy' >>>> >>>> To whom it may concern, >>>> >>>> I am trying to install scikit-learn in a PySpark job using the >>>> install_pypi_package PySpark API but the install fails with : >>>> >>>> sc.install_pypi_package("scikit-learn") >>>> >>>> Collecting scikit-learn >>>> Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz >>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>>> Collecting scipy>=0.19.1 (from scikit-learn) >>>> Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl >>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>>> Collecting threadpoolctl>=2.0.0 (from scikit-learn) >>>> Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl >>>> Building wheels for collected packages: scikit-learn >>>> Running setup.py bdist_wheelfor scikit-learn: started >>>> Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' >>>> Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: >>>> Partial import of sklearn during the build process. >>>> Traceback (most recent call last): >>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status >>>> module = importlib.import_module(package) >>>> File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module >>>> return _bootstrap._gcd_import(name[level:], package, level) >>>> File "", line 1006, in _gcd_import >>>> File "", line 983, in _find_and_load >>>> File "", line 965, in _find_and_load_unlocked >>>> ModuleNotFoundError: No module named 'scipy' >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in >>>> setup_package() >>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package >>>> check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) >>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status >>>> .format(package, req_str, instructions)) >>>> ImportError: scipy is not installed. >>>> scikit-learn requires scipy >= 0.19.1. >>>> >>>> I do not encounter this error with scikit-learn 0.23.2 : >>>> >>>> sc.install_pypi_package("scikit-learn==0.23.2") >>>> >>>> Collecting scikit-learn==0.23.2 >>>> Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl >>>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>>> Installing collected packages: scikit-learn >>>> Successfully installed scikit-learn-0.23.2 >>>> >>>> >>>> Could you please help me understand why the scikit-learn 0.24 >>>> installation fails ? >>>> >>>> Thank you for your help, >>>> >>>> Bertrand >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> Scikit-learn @ Inria Foundation >>> https://glemaitre.github.io/ >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Fri Jan 22 15:48:46 2021 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Fri, 22 Jan 2021 21:48:46 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: References: Message-ID: Hi Thanks for the replies. I read about the available functions in the PCA section. Consider the following code x = StandardScaler().fit_transform(x) pca = PCA() principalComponents = pca.fit_transform(x) principalDf = pd.DataFrame(data = principalComponents) loadings = pca.components_ finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1) print( "First and second observations\n", finalDf.loc[0:1] ) print( "loadings[0:1]\n", loadings[0], loadings[1] ) print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) The output looks like First and second observations 0 1 2 3 4 kernel 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 loadings[0:1] [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 -0.01257726 0.29718078 0.07493325 0.07562934] explained_variance_ratio_ [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] As you can see for two kernels named ELEC1 and ELEC2, there are five PCs from 0 to 4. Now based on the numbers in the loadings, I expect that loadings[0] which is the first variable is better shown on PC1-PC2 plane (0.49137412,0.46511098). However, loadings[1] which is the second variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). Is this understanding correct? I don't understand what explained_variance_ratio_ is trying to say here. Regards, Mahmood On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug wrote: > > Hi Mahmood, > > There are different pieces of info that you can get from PCA: > > 1. How important is a given PC to reconstruct the entire dataset -> This > is given by explained_variance_ratio_ as Guillaume suggested > > 2. What is the contribution of each feature to each PC (remember that a > PC is a linear combination of all the features i.e.: PC_1 = X_1 . > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what > you're looking for and they are given in the components_ matrix which is > a n_components x n_features matrix. > > Nicolas > > On 1/22/21 9:13 AM, Mahmood Naderan wrote: > > Hi > > I have a question about PCA and that is, how we can determine, a > > variable, X, is better captured by which factor (principal > > component)? For example, maybe one variable has low weight in the > > first PC but has a higher weight in the fifth PC. > > > > When I use the PCA from Scikit, I have to manually work with the PCs, > > therefore, I may miss the point that although a variable is weak in > > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > > > Any comment on that? > > > > Regards, > > Mahmood > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From bertrand25mtl at gmail.com Sat Jan 23 11:16:20 2021 From: bertrand25mtl at gmail.com (Bertrand B.) Date: Sat, 23 Jan 2021 11:16:20 -0500 Subject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' In-Reply-To: References: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com> <2130832503.2194460.1611185533518@mail.yahoo.com> Message-ID: Thank you Guillaume for your help, When I start a Spark cluster on AWS, I add a bootstrap step to update pip and install sklearn so that users no longer have to install scikit-learn in their job with sc.install_pypi_package. We are using Spark with sklearn to run hyper-parameter tuning using spark to run many model configurations in parallel (broadcasting the pandas dataframe and running independent models on each Spark container). That is why we need to have scikit learn installed on each worker node. This technique works very well conditional that the pandas dataframe fits in the container memory (each spark container will have a copy of the pandas dataframe). Thank you for your great work and help, Cheers, Bertrand Le ven. 22 janv. 2021 ? 09:06, Guillaume Lema?tre a ?crit : > OK, so the normal install is working. Now, to fix your issue we need to > understand how `sc.install_pypi_package` is working and mainly how does it > call `pip`. We need to make sure that it call the right pip (the system > `pip3` in your case). > > > On Fri, 22 Jan 2021 at 14:39, Bertrand B. wrote: > >> Thank you Guillaume for your help, >> >> I am using : (running on AWS EMR-6.2) >> pip3 --version >> pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7) >> >> >> pip3 install scikit-learn >> >> Collecting scikit-learn >> Using cached >> https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz >> Requirement already satisfied: numpy>=1.13.3 in >> /usr/local/lib64/python3.7/site-packages (from scikit-learn) >> Collecting scipy>=0.19.1 (from scikit-learn) >> Using cached >> https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl >> Requirement already satisfied: joblib>=0.11 in >> /usr/local/lib64/python3.7/site-packages (from scikit-learn) >> Requirement already satisfied: threadpoolctl>=2.0.0 in >> /usr/local/lib/python3.7/site-packages (from scikit-learn) >> Installing collected packages: scipy, scikit-learn >> Running setup.py install for scikit-learn ... error >> Complete output from command /usr/bin/python3 -u -c "import >> setuptools, >> tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize, >> 'open', open)(__file__);code=f.read().replace('\r\n', >> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >> /tmp/pip-0ulalx36-record/install-record.txt >> --single-version-externally-managed --compile: >> Partial import of sklearn during the build process. >> Traceback (most recent call last): >> File >> "/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py", >> line 27, in _check_cython_version >> import Cython >> ModuleNotFoundError: No module named 'Cython' >> >> >> Upgrading pip to 20.3.3 : >> >> sudo pip3 install --upgrade pip >> sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3 >> >> pip3 --version >> pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7) >> >> let me install from the whl file : >> pip3 install scikit-learn >> Collecting scikit-learn >> Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl >> (22.3 MB) >> >> However, using the API sc.install_pypi_package("scikit-learn") still uses >> the tar file instead of the whl file (even after the pip upgrade). >> >> Collecting scikit-learn >> Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz >> >> >> Thanks for your help, >> >> Cheers, >> >> Bertrand >> >> Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre >> a ?crit : >> >>> @Bertrand Could you tell us which version of `pip` to you use (you need >>> pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014) >>> >>> On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre >>> wrote: >>> >>>> We might experience an issue with PyPI not selecting the manylinux2010 >>>> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233 >>>> We have to check but we will probably shortly upload manylinux1 wheels >>>> that should resolve the issue. >>>> >>>> I am curious if fetching the wheel by hand and installing via `pip` >>>> would be a workaround (not practical for automated usage thought). >>>> >>>> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn < >>>> scikit-learn at python.org> wrote: >>>> >>>>> Use the Anaconda Python installation. >>>>> >>>>> "You won't find the right answers if you don't ask the right >>>>> questions!" (Robert Helmbold, 2013) >>>>> >>>>> >>>>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre < >>>>> g.lemaitre58 at gmail.com> wrote: >>>>> >>>>> >>>>> Basically it get the tar with the source and recompile instead of >>>>> using the wheel. Could you force an install from PyPI without using the >>>>> cached file. >>>>> >>>>> We pushed wheels yesterday for 0.24.1 as well so it should not get the >>>>> 0.24.0 version. >>>>> >>>>> For 0.23.2, you can see that it used the wheel (.whl). >>>>> >>>>> Sent from my phone - sorry to be brief and potential misspell. >>>>> *From:* bertrand25mtl at gmail.com >>>>> *Sent:* 20 January 2021 23:21 >>>>> *To:* scikit-learn at python.org >>>>> *Reply to:* scikit-learn at python.org >>>>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with >>>>> ModuleNotFoundError: No module named 'scipy' >>>>> >>>>> To whom it may concern, >>>>> >>>>> I am trying to install scikit-learn in a PySpark job using the >>>>> install_pypi_package PySpark API but the install fails with : >>>>> >>>>> sc.install_pypi_package("scikit-learn") >>>>> >>>>> Collecting scikit-learn >>>>> Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz >>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>>>> Collecting scipy>=0.19.1 (from scikit-learn) >>>>> Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl >>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn) >>>>> Collecting threadpoolctl>=2.0.0 (from scikit-learn) >>>>> Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl >>>>> Building wheels for collected packages: scikit-learn >>>>> Running setup.py bdist_wheelfor scikit-learn: started >>>>> Running setup.py bdist_wheelfor scikit-learn: finished with status 'error' >>>>> Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37: >>>>> Partial import of sklearn during the build process. >>>>> Traceback (most recent call last): >>>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status >>>>> module = importlib.import_module(package) >>>>> File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module >>>>> return _bootstrap._gcd_import(name[level:], package, level) >>>>> File "", line 1006, in _gcd_import >>>>> File "", line 983, in _find_and_load >>>>> File "", line 965, in _find_and_load_unlocked >>>>> ModuleNotFoundError: No module named 'scipy' >>>>> Traceback (most recent call last): >>>>> File "", line 1, in >>>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in >>>>> setup_package() >>>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package >>>>> check_package_status('scipy', min_deps.SCIPY_MIN_VERSION) >>>>> File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status >>>>> .format(package, req_str, instructions)) >>>>> ImportError: scipy is not installed. >>>>> scikit-learn requires scipy >= 0.19.1. >>>>> >>>>> I do not encounter this error with scikit-learn 0.23.2 : >>>>> >>>>> sc.install_pypi_package("scikit-learn==0.23.2") >>>>> >>>>> Collecting scikit-learn==0.23.2 >>>>> Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl >>>>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2) >>>>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2) >>>>> Installing collected packages: scikit-learn >>>>> Successfully installed scikit-learn-0.23.2 >>>>> >>>>> >>>>> Could you please help me understand why the scikit-learn 0.24 >>>>> installation fails ? >>>>> >>>>> Thank you for your help, >>>>> >>>>> Bertrand >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> >>>> >>>> -- >>>> Guillaume Lemaitre >>>> Scikit-learn @ Inria Foundation >>>> https://glemaitre.github.io/ >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> Scikit-learn @ Inria Foundation >>> https://glemaitre.github.io/ >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivertomic at zoho.com Sun Jan 24 06:52:57 2021 From: olivertomic at zoho.com (Oliver Tomic) Date: Sun, 24 Jan 2021 12:52:57 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: References: Message-ID: <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com> Hi Mahmood,? the information you need is given by the individual explained variance for each variable / feature. You get that information from the hoggorm package (Python): https://github.com/olivertomic/hoggorm https://hoggorm.readthedocs.io/en/latest/index.html? Here is one of the PCA examples provided in a Jupyter notebook: https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb When you do PCA you get the information by calling for example: cumCalExplVar_individualVariable?= model.X_cumCalExplVar() (which gives you the cumulative calibrated explained variance for each variable, cell 21 in the notebook) cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives you the cumulative validated explained variance variable, cell 30 in the notebook) The component where you get the biggest jump for the variable of interest is the component you are looking for.? You could also have a look at the correlation loadings to identify the component you are looking for.? cheers Oliver ---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan wrote ---- Hi Thanks for the replies. I read about the available functions in the PCA section. Consider the following code x = StandardScaler().fit_transform(x) pca = PCA() principalComponents = pca.fit_transform(x) principalDf = pd.DataFrame(data = principalComponents) loadings = pca.components_ finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1) print( "First and second observations\n", finalDf.loc[0:1] ) print( "loadings[0:1]\n", loadings[0], loadings[1] ) print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) The output looks like First and second observations 0 1 2 3 4 kernel 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 loadings[0:1] [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 -0.01257726 0.29718078 0.07493325 0.07562934] explained_variance_ratio_ [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] As you can see for two kernels named ELEC1 and ELEC2, there are five PCs from 0 to 4. Now based on the numbers in the loadings, I expect that loadings[0] which is the first variable is better shown on PC1-PC2 plane (0.49137412,0.46511098). However, loadings[1] which is the second variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). Is this understanding correct? I don't understand what explained_variance_ratio_ is trying to say here. Regards, Mahmood On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug wrote: > > Hi Mahmood, > > There are different pieces of info that you can get from PCA: > > 1. How important is a given PC to reconstruct the entire dataset -> This > is given by explained_variance_ratio_ as Guillaume suggested > > 2. What is the contribution of each feature to each PC (remember that a > PC is a linear combination of all the features i.e.: PC_1 = X_1 . > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what > you're looking for and they are given in the components_ matrix which is > a n_components x n_features matrix. > > Nicolas > > On 1/22/21 9:13 AM, Mahmood Naderan wrote: > > Hi > > I have a question about PCA and that is, how we can determine, a > > variable, X, is better captured by which factor (principal > > component)? For example, maybe one variable has low weight in the > > first PC but has a higher weight in the fifth PC. > > > > When I use the PCA from Scikit, I have to manually work with the PCs, > > therefore, I may miss the point that although a variable is weak in > > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > > > Any comment on that? > > > > Regards, > > Mahmood > > _______________________________________________ > > scikit-learn mailing list > > mailto:scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > mailto:scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list mailto:scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahmood.nt at gmail.com Sun Jan 24 15:37:49 2021 From: mahmood.nt at gmail.com (Mahmood Naderan) Date: Sun, 24 Jan 2021 21:37:49 +0100 Subject: [scikit-learn] Finding the PC that captures a specific variable In-Reply-To: <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com> References: <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com> Message-ID: Hi Olivier, Thanks for the suggestion. The package seems to be handy. I will try that. Regards, Mahmood On Sun, Jan 24, 2021 at 12:55 PM Oliver Tomic via scikit-learn wrote: > > Hi Mahmood, > > the information you need is given by the individual explained variance for each variable / feature. You get that information from the hoggorm package (Python): > > https://github.com/olivertomic/hoggorm > https://hoggorm.readthedocs.io/en/latest/index.html > > Here is one of the PCA examples provided in a Jupyter notebook: > https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb > > > When you do PCA you get the information by calling for example: > > cumCalExplVar_individualVariable = model.X_cumCalExplVar() (which gives you the cumulative calibrated explained variance for each variable, cell 21 in the notebook) > > cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives you the cumulative validated explained variance variable, cell 30 in the notebook) > > > The component where you get the biggest jump for the variable of interest is the component you are looking for. > > You could also have a look at the correlation loadings to identify the component you are looking for. > > cheers > Oliver > > > > > > > ---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan wrote ---- > > Hi > Thanks for the replies. I read about the available functions in the > PCA section. Consider the following code > > x = StandardScaler().fit_transform(x) > pca = PCA() > principalComponents = pca.fit_transform(x) > principalDf = pd.DataFrame(data = principalComponents) > loadings = pca.components_ > finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1) > print( "First and second observations\n", finalDf.loc[0:1] ) > print( "loadings[0:1]\n", loadings[0], loadings[1] ) > print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) > > > The output looks like > > First and second observations > 0 1 2 3 4 kernel > 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 > 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 > loadings[0:1] > [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 > -0.01257726 0.29718078 0.07493325 0.07562934] > explained_variance_ratio_ > [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] > > > > As you can see for two kernels named ELEC1 and ELEC2, there are five > PCs from 0 to 4. > Now based on the numbers in the loadings, I expect that loadings[0] > which is the first variable is better shown on PC1-PC2 plane > (0.49137412,0.46511098). However, loadings[1] which is the second > variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). > Is this understanding correct? > > I don't understand what explained_variance_ratio_ is trying to say here. > > > Regards, > Mahmood > > On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug wrote: > > > > Hi Mahmood, > > > > There are different pieces of info that you can get from PCA: > > > > 1. How important is a given PC to reconstruct the entire dataset -> This > > is given by explained_variance_ratio_ as Guillaume suggested > > > > 2. What is the contribution of each feature to each PC (remember that a > > PC is a linear combination of all the features i.e.: PC_1 = X_1 . > > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what > > you're looking for and they are given in the components_ matrix which is > > a n_components x n_features matrix. > > > > Nicolas > > > > On 1/22/21 9:13 AM, Mahmood Naderan wrote: > > > Hi > > > I have a question about PCA and that is, how we can determine, a > > > variable, X, is better captured by which factor (principal > > > component)? For example, maybe one variable has low weight in the > > > first PC but has a higher weight in the fifth PC. > > > > > > When I use the PCA from Scikit, I have to manually work with the PCs, > > > therefore, I may miss the point that although a variable is weak in > > > PC1-PC2 plot, it may be strong in PC4-PC5 plot. > > > > > > Any comment on that? > > > > > > Regards, > > > Mahmood > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From rdslater at gmail.com Sun Jan 31 14:43:32 2021 From: rdslater at gmail.com (Robert Slater) Date: Sun, 31 Jan 2021 13:43:32 -0600 Subject: [scikit-learn] LassoCV.coef not implemented (I think) Message-ID: I was writing an example for my students when I came across what I think is an issue. In version 24.1 using the LassoCV, the .coef variable should have a list of my coeficeients (at least according to my understanding of the documents). However, the variable is not populated nad throws an error 'LassoCV' object has no attribute 'coef' I do have a .coef_ variable which I believe is the coefficient for the best fit only. the alphas and alphas_ variables have a similar issue in that alphas returns nothing while alphas_ returns the list of alphas used. I'm not sure if this is an documentation oversight or a real issue but wanted to get clarification. I can get what I need from o ther methods, but wanted to see if this needed to be addressed. Best Regards, Robert Slater -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Sun Jan 31 15:00:34 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sun, 31 Jan 2021 21:00:34 +0100 Subject: [scikit-learn] LassoCV.coef not implemented (I think) In-Reply-To: References: Message-ID: Hi Robert, > I do have a .coef_ variable which I believe is the coefficient for the best fit only. `coef` never existed. Fitted attributes always end with underscore. We do not store coefficients for all fitted `alphas_`. We provide some information regarding the MSE path for all tried alphas: https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html > the alphas and alphas_ variables have a similar issue in that alphas returns nothing while alphas_ returns the list of alphas used. You probably created a model such as `model = LasssoCV()`. By default, the parameter `alpha=None` thus accessing it will return None. After fitting, `alphas_` will be automatically created as specified in the documentation. It will correspond to the values tried by cross-validation. If instead, you are passing an array to `alphas` then `alphas_` will be the same as `alphas_` after calling `fit`. Cheers, On Sun, 31 Jan 2021 at 20:45, Robert Slater wrote: > I was writing an example for my students when I came across what I think > is an issue. In version 24.1 using the LassoCV, the .coef variable > should have a list of my coeficeients (at least according to my > understanding of the documents). However, the variable is not populated > nad throws an error > > 'LassoCV' object has no attribute 'coef' > > > I do have a .coef_ variable which I believe is the coefficient for the > best fit only. > > the alphas and alphas_ variables have a similar issue in that alphas > returns nothing while alphas_ returns the list of alphas used. > > I'm not sure if this is an documentation oversight or a real issue but > wanted to get clarification. > > I can get what I need from o ther methods, but wanted to see if this > needed to be addressed. > > Best Regards, > > Robert Slater > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdslater at gmail.com Sun Jan 31 15:22:16 2021 From: rdslater at gmail.com (Robert Slater) Date: Sun, 31 Jan 2021 14:22:16 -0600 Subject: [scikit-learn] LassoCV.coef not implemented (I think) In-Reply-To: References: Message-ID: Appreciate the clarification. I definitely think the docs need some polish as coef_ only returns a single fitting of coefficients and not the coefficients along the path as stated in the api guide. I am seeing alpha_ alphas_ coef_ dual_gap_ as fitted variables (plus a few more) which is slightly different than the guide/api docs (all the names are plural in the api guide) I don't know if there is way to contribute an edit to the docs, I'd be more than happy to do it (Sorry I'm very OCD about such things, and I know this is a minor details)., I'd be happy to suggest the edit through proper channels. On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre wrote: > Hi Robert, > > > I do have a .coef_ variable which I believe is the coefficient for the > best fit only. > > `coef` never existed. Fitted attributes always end with underscore. > We do not store coefficients for all fitted `alphas_`. > We provide some information regarding the MSE path for all tried alphas: > https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html > > > the alphas and alphas_ variables have a similar issue in that alphas > returns nothing while alphas_ returns the list of alphas used. > > You probably created a model such as `model = LasssoCV()`. By default, the > parameter `alpha=None` thus accessing it will return None. After fitting, > `alphas_` will be automatically created as specified in the documentation. > It will correspond to the values tried by cross-validation. > If instead, you are passing an array to `alphas` then `alphas_` will be > the same as `alphas_` after calling `fit`. > > Cheers, > > > On Sun, 31 Jan 2021 at 20:45, Robert Slater wrote: > >> I was writing an example for my students when I came across what I think >> is an issue. In version 24.1 using the LassoCV, the .coef variable >> should have a list of my coeficeients (at least according to my >> understanding of the documents). However, the variable is not populated >> nad throws an error >> >> 'LassoCV' object has no attribute 'coef' >> >> >> I do have a .coef_ variable which I believe is the coefficient for the >> best fit only. >> >> the alphas and alphas_ variables have a similar issue in that alphas >> returns nothing while alphas_ returns the list of alphas used. >> >> I'm not sure if this is an documentation oversight or a real issue but >> wanted to get clarification. >> >> I can get what I need from o ther methods, but wanted to see if this >> needed to be addressed. >> >> Best Regards, >> >> Robert Slater >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Sun Jan 31 15:37:19 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sun, 31 Jan 2021 21:37:19 +0100 Subject: [scikit-learn] LassoCV.coef not implemented (I think) In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 at 21:24, Robert Slater wrote: > Appreciate the clarification. I definitely think the docs need some > polish as coef_ only returns a single fitting of coefficients and not the > coefficients along the path as stated in the api guide. > I am confused here. LassoCV states: *coef : *ndarray of shape (n_features,) or (n_targets, n_features) Parameter vector (w in the cost function formula). So it seems exactly what it is returning. It does not return the coefficients along the path. Which documentation are you referring to when stating the API guide (if you could provide a link, it would be really helpful)? > I am seeing > > alpha_ > alphas_ > coef_ > dual_gap_ > > as fitted variables (plus a few more) which is slightly different than the > guide/api docs (all the names are plural in the api guide) > > I don't know if there is way to contribute an edit to the docs, I'd be > more than happy to do it (Sorry I'm very OCD about such things, and I know > this is a minor details)., I'd be happy to suggest the edit through proper > channels. > You can always open a PR in the GitHub scikit-learn repository because the documentation is actually the docstring from the classes and functions. The user guide documentation is located in the /doc folder and the contributing guide will be helpful to start with: https://scikit-learn.org/stable/developers/contributing.html > > On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre > wrote: > >> Hi Robert, >> >> > I do have a .coef_ variable which I believe is the coefficient for the >> best fit only. >> >> `coef` never existed. Fitted attributes always end with underscore. >> We do not store coefficients for all fitted `alphas_`. >> We provide some information regarding the MSE path for all tried alphas: >> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html >> >> > the alphas and alphas_ variables have a similar issue in that alphas >> returns nothing while alphas_ returns the list of alphas used. >> >> You probably created a model such as `model = LasssoCV()`. By default, >> the parameter `alpha=None` thus accessing it will return None. After >> fitting, >> `alphas_` will be automatically created as specified in the >> documentation. It will correspond to the values tried by cross-validation. >> If instead, you are passing an array to `alphas` then `alphas_` will be >> the same as `alphas_` after calling `fit`. >> >> Cheers, >> >> >> On Sun, 31 Jan 2021 at 20:45, Robert Slater wrote: >> >>> I was writing an example for my students when I came across what I think >>> is an issue. In version 24.1 using the LassoCV, the .coef variable >>> should have a list of my coeficeients (at least according to my >>> understanding of the documents). However, the variable is not populated >>> nad throws an error >>> >>> 'LassoCV' object has no attribute 'coef' >>> >>> >>> I do have a .coef_ variable which I believe is the coefficient for the >>> best fit only. >>> >>> the alphas and alphas_ variables have a similar issue in that alphas >>> returns nothing while alphas_ returns the list of alphas used. >>> >>> I'm not sure if this is an documentation oversight or a real issue but >>> wanted to get clarification. >>> >>> I can get what I need from o ther methods, but wanted to see if this >>> needed to be addressed. >>> >>> Best Regards, >>> >>> Robert Slater >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.lemaitre58 at gmail.com Sun Jan 31 15:38:20 2021 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Sun, 31 Jan 2021 21:38:20 +0100 Subject: [scikit-learn] LassoCV.coef not implemented (I think) In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 at 21:37, Guillaume Lema?tre wrote: > > > On Sun, 31 Jan 2021 at 21:24, Robert Slater wrote: > >> Appreciate the clarification. I definitely think the docs need some >> polish as coef_ only returns a single fitting of coefficients and not the >> coefficients along the path as stated in the api guide. >> > > I am confused here. LassoCV states: > > *coef : *ndarray of shape (n_features,) or (n_targets, n_features) > Ups, `coef_` indeed (I messed up the copy-paste) > Parameter vector (w in the cost function formula). > So it seems exactly what it is returning. It does not return the > coefficients along the path. > Which documentation are you referring to when stating the API guide (if > you could provide a link, it would be really helpful)? > > >> I am seeing >> >> alpha_ >> alphas_ >> coef_ >> dual_gap_ >> >> as fitted variables (plus a few more) which is slightly different than >> the guide/api docs (all the names are plural in the api guide) >> >> I don't know if there is way to contribute an edit to the docs, I'd be >> more than happy to do it (Sorry I'm very OCD about such things, and I know >> this is a minor details)., I'd be happy to suggest the edit through proper >> channels. >> > > You can always open a PR in the GitHub scikit-learn repository because the > documentation is actually the docstring from the classes and functions. > The user guide documentation is located in the /doc folder and the > contributing guide will be helpful to start with: > https://scikit-learn.org/stable/developers/contributing.html > > >> >> On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre < >> g.lemaitre58 at gmail.com> wrote: >> >>> Hi Robert, >>> >>> > I do have a .coef_ variable which I believe is the coefficient for the >>> best fit only. >>> >>> `coef` never existed. Fitted attributes always end with underscore. >>> We do not store coefficients for all fitted `alphas_`. >>> We provide some information regarding the MSE path for all tried alphas: >>> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html >>> >>> > the alphas and alphas_ variables have a similar issue in that alphas >>> returns nothing while alphas_ returns the list of alphas used. >>> >>> You probably created a model such as `model = LasssoCV()`. By default, >>> the parameter `alpha=None` thus accessing it will return None. After >>> fitting, >>> `alphas_` will be automatically created as specified in the >>> documentation. It will correspond to the values tried by cross-validation. >>> If instead, you are passing an array to `alphas` then `alphas_` will be >>> the same as `alphas_` after calling `fit`. >>> >>> Cheers, >>> >>> >>> On Sun, 31 Jan 2021 at 20:45, Robert Slater wrote: >>> >>>> I was writing an example for my students when I came across what I >>>> think is an issue. In version 24.1 using the LassoCV, the .coef >>>> variable should have a list of my coeficeients (at least according to my >>>> understanding of the documents). However, the variable is not populated >>>> nad throws an error >>>> >>>> 'LassoCV' object has no attribute 'coef' >>>> >>>> >>>> I do have a .coef_ variable which I believe is the coefficient for the >>>> best fit only. >>>> >>>> the alphas and alphas_ variables have a similar issue in that alphas >>>> returns nothing while alphas_ returns the list of alphas used. >>>> >>>> I'm not sure if this is an documentation oversight or a real issue but >>>> wanted to get clarification. >>>> >>>> I can get what I need from o ther methods, but wanted to see if this >>>> needed to be addressed. >>>> >>>> Best Regards, >>>> >>>> Robert Slater >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> Scikit-learn @ Inria Foundation >>> https://glemaitre.github.io/ >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdslater at gmail.com Sun Jan 31 15:45:58 2021 From: rdslater at gmail.com (Robert Slater) Date: Sun, 31 Jan 2021 14:45:58 -0600 Subject: [scikit-learn] LassoCV.coef not implemented (I think) In-Reply-To: References: Message-ID: Ok its on me--I was reading the return objects for path method. My apologies. On Sun, Jan 31, 2021 at 2:38 PM Guillaume Lema?tre wrote: > > > On Sun, 31 Jan 2021 at 21:24, Robert Slater wrote: > >> Appreciate the clarification. I definitely think the docs need some >> polish as coef_ only returns a single fitting of coefficients and not the >> coefficients along the path as stated in the api guide. >> > > I am confused here. LassoCV states: > > *coef : *ndarray of shape (n_features,) or (n_targets, n_features) > > Parameter vector (w in the cost function formula). > So it seems exactly what it is returning. It does not return the > coefficients along the path. > Which documentation are you referring to when stating the API guide (if > you could provide a link, it would be really helpful)? > > >> I am seeing >> >> alpha_ >> alphas_ >> coef_ >> dual_gap_ >> >> as fitted variables (plus a few more) which is slightly different than >> the guide/api docs (all the names are plural in the api guide) >> >> I don't know if there is way to contribute an edit to the docs, I'd be >> more than happy to do it (Sorry I'm very OCD about such things, and I know >> this is a minor details)., I'd be happy to suggest the edit through proper >> channels. >> > > You can always open a PR in the GitHub scikit-learn repository because the > documentation is actually the docstring from the classes and functions. > The user guide documentation is located in the /doc folder and the > contributing guide will be helpful to start with: > https://scikit-learn.org/stable/developers/contributing.html > > >> >> On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre < >> g.lemaitre58 at gmail.com> wrote: >> >>> Hi Robert, >>> >>> > I do have a .coef_ variable which I believe is the coefficient for the >>> best fit only. >>> >>> `coef` never existed. Fitted attributes always end with underscore. >>> We do not store coefficients for all fitted `alphas_`. >>> We provide some information regarding the MSE path for all tried alphas: >>> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html >>> >>> > the alphas and alphas_ variables have a similar issue in that alphas >>> returns nothing while alphas_ returns the list of alphas used. >>> >>> You probably created a model such as `model = LasssoCV()`. By default, >>> the parameter `alpha=None` thus accessing it will return None. After >>> fitting, >>> `alphas_` will be automatically created as specified in the >>> documentation. It will correspond to the values tried by cross-validation. >>> If instead, you are passing an array to `alphas` then `alphas_` will be >>> the same as `alphas_` after calling `fit`. >>> >>> Cheers, >>> >>> >>> On Sun, 31 Jan 2021 at 20:45, Robert Slater wrote: >>> >>>> I was writing an example for my students when I came across what I >>>> think is an issue. In version 24.1 using the LassoCV, the .coef >>>> variable should have a list of my coeficeients (at least according to my >>>> understanding of the documents). However, the variable is not populated >>>> nad throws an error >>>> >>>> 'LassoCV' object has no attribute 'coef' >>>> >>>> >>>> I do have a .coef_ variable which I believe is the coefficient for the >>>> best fit only. >>>> >>>> the alphas and alphas_ variables have a similar issue in that alphas >>>> returns nothing while alphas_ returns the list of alphas used. >>>> >>>> I'm not sure if this is an documentation oversight or a real issue but >>>> wanted to get clarification. >>>> >>>> I can get what I need from o ther methods, but wanted to see if this >>>> needed to be addressed. >>>> >>>> Best Regards, >>>> >>>> Robert Slater >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> Scikit-learn @ Inria Foundation >>> https://glemaitre.github.io/ >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: