From solegalli at protonmail.com  Mon Jan  4 12:56:52 2021
From: solegalli at protonmail.com (Sole Galli)
Date: Mon, 04 Jan 2021 17:56:52 +0000
Subject: [scikit-learn] IterativeImputer
Message-ID: <TtiIgsmc9KAXt5aEmdxnrPuoEXeXGESEvbjmer5ONhWoKPM0fpPKTMcReIEnh_UbR_ruP1n0g_TRwr4sEDOluGFl4e2w_nTwoNMiqONI008=@protonmail.com>

Hello team,

I am reading in some of the MICE original articles that supposedly, each variable should be modelled upon the other ones in the data, with a suitable model. So for example, if the variable with NA is binary, it should be modelled with classification, or if continuous with a regression model.

Am I correct to understand that this is not possible yet with the IterativeImputer? because I should set the estimator in the estimator parameter and that will be used for all variables.

Is there a workaround?

Thanks a lot!

Regards

Soledad Galli
https://www.trainindata.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210104/0c5f870f/attachment.html>

From g.lemaitre58 at gmail.com  Tue Jan  5 03:34:27 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Tue, 5 Jan 2021 09:34:27 +0100
Subject: [scikit-learn] Comparing Scikit and Xlstat for PCA analysis
In-Reply-To: <CADa2P2UU5T5yeLjNPsfisPTVmH1eCeQLf2=kPcr_ixbRGwG2-w@mail.gmail.com>
References: <CADa2P2VXhRo-gAR_vjbTLq08g8X54Mr1rN_BB=Eh-HdLUwxuew@mail.gmail.com>
 <fsjpb2mpafl7d4pkm5io89v0.1609169286168@gmail.com>
 <CADa2P2UU5T5yeLjNPsfisPTVmH1eCeQLf2=kPcr_ixbRGwG2-w@mail.gmail.com>
Message-ID: <CACDxx9gKPqHgFwMPgXa8OiA7Wf_-9pUsLGwuJJgfNYXah+XsEw@mail.gmail.com>

Yes:

*svd_solver*{?auto?, ?full?, ?arpack?, ?randomized?}, default=?auto?If auto
:

The solver is selected by a default policy based on X.shape and n_components:
if the input data is larger than 500x500 and the number of components to
extract is lower than 80% of the smallest dimension of the data, then the
more efficient ?randomized? method is enabled. Otherwise the exact full SVD
is computed and optionally truncated afterwards.
If full :

run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd
and select the components by postprocessing
If arpack :

run SVD truncated to n_components calling ARPACK solver via
scipy.sparse.linalg.svds. It requires strictly 0 < n_components <
min(X.shape)
If randomized :

run randomized SVD by the method of Halko et al.

New in version 0.18.0.

On Mon, 28 Dec 2020 at 17:54, Mahmood Naderan <mahmood.nt at gmail.com> wrote:

> Hi Guillaume,
> Thanks for the reply. May I know if I can choose different solvers in the
> scikit package or not.
>
> Regards,
> Mahmood
>
>
>
>
> On Mon, Dec 28, 2020 at 4:30 PM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> wrote:
>
>> n_components set to  'auto' is a strategy that will pick the number of
>> components. The sign of the PC does not matter so much since they are still
>> orthogonal. So change will depend of the solver that should be different in
>> both software.
>>
>>
>>
>>
>> Sent from my phone - sorry to be brief and potential misspell.
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210105/86fdd49c/attachment.html>

From glennmschultz at me.com  Tue Jan  5 12:24:53 2021
From: glennmschultz at me.com (Glenn Schultz)
Date: Tue, 5 Jan 2021 11:24:53 -0600
Subject: [scikit-learn] extraction of grid search values
Message-ID: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com>

All,

I have a grid search of gradient boosting classifier.  All works well the best model is extracted and predict works on the model.  I would like to extract the cv_results_  My set-up is pretty standard

gbclassifier = GridSearchCV(GradientBoostingClassifier(),
			parameters,
			verbose = 5,
			n_jobs = 5,
			cv = ShuffleSplit(n_splits = 5, test_size = .2, random_state = 42),
			refit = True,
			scoring = ?roc_auc?)

print(gbclassifier.cv_results_) 

returns an attribute error ?Gradient Boosting Classifier? has no attribute cv_results.  I am not sure what I am doing wrong I checked the documentation and followed some SO examples but no progress.  I am missing something any help is appreciated.

Best,
Glenn

From niourf at gmail.com  Tue Jan  5 15:58:56 2021
From: niourf at gmail.com (Nicolas Hug)
Date: Tue, 5 Jan 2021 20:58:56 +0000
Subject: [scikit-learn] extraction of grid search values
In-Reply-To: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com>
References: <1813D479-E00F-4DB0-9BAF-64D2C007E8CD@me.com>
Message-ID: <CAPeHtrkhaMEnczwDmHXiD=18ez3LrofWr1bL2R4p0HpRUzXfkQ@mail.gmail.com>

Glenn,

You need to fit the estimator with some data for the cv_results_ attribute
to exist. You may refer to
https://scikit-learn.org/stable/getting_started.html

Nicolas

On Tue, 5 Jan 2021 at 17:25, Glenn Schultz via scikit-learn <
scikit-learn at python.org> wrote:

> All,
>
> I have a grid search of gradient boosting classifier.  All works well the
> best model is extracted and predict works on the model.  I would like to
> extract the cv_results_  My set-up is pretty standard
>
> gbclassifier = GridSearchCV(GradientBoostingClassifier(),
>                         parameters,
>                         verbose = 5,
>                         n_jobs = 5,
>                         cv = ShuffleSplit(n_splits = 5, test_size = .2,
> random_state = 42),
>                         refit = True,
>                         scoring = ?roc_auc?)
>
> print(gbclassifier.cv_results_)
>
> returns an attribute error ?Gradient Boosting Classifier? has no attribute
> cv_results.  I am not sure what I am doing wrong I checked the
> documentation and followed some SO examples but no progress.  I am missing
> something any help is appreciated.
>
> Best,
> Glenn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210105/444cb9f8/attachment.html>

From icefrog1950 at gmail.com  Wed Jan  6 06:00:46 2021
From: icefrog1950 at gmail.com (Liu James)
Date: Wed, 6 Jan 2021 19:00:46 +0800
Subject: [scikit-learn] 2 million samples dataset caused python and OS crash
Message-ID: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>

Hi all,

I'm using a medium dataset KDD99  IDS(
https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
for model training, and the dataset has 2 million  samples.  When using
fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
dumped core. Stack trace
.../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".

The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited.
Such crash can be reproduced.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210106/4072f441/attachment.html>

From ahowe42 at gmail.com  Wed Jan  6 06:46:36 2021
From: ahowe42 at gmail.com (Andrew Howe)
Date: Wed, 6 Jan 2021 11:46:36 +0000
Subject: [scikit-learn] 2 million samples dataset caused python and OS
 crash
In-Reply-To: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>
References: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>
Message-ID: <CANnYi3QX_zyOxMVOvu+VMY=4iN2B7tYBSdCxqgJBbWy78gUg6w@mail.gmail.com>

A core dump generally happens when a process tries to access memory outside
it's allocated address space. You've not specified what estimator you were
using, but I'd guess it attempted to do something with the dataset that
resulted in it being duplicated or otherwise expanded beyond the memory
capacity. Perhaps the full stack trace would be helpful.

Andrew


<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>


On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1950 at gmail.com> wrote:

> Hi all,
>
> I'm using a medium dataset KDD99  IDS(
> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
> for model training, and the dataset has 2 million  samples.  When using
> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
> dumped core. Stack trace
> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>
> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited.
> Such crash can be reproduced.
>
> Thanks.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210106/035ee120/attachment-0001.html>

From g.lemaitre58 at gmail.com  Wed Jan  6 09:31:38 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Wed, 6 Jan 2021 15:31:38 +0100
Subject: [scikit-learn] 2 million samples dataset caused python and OS
 crash
In-Reply-To: <CANnYi3QX_zyOxMVOvu+VMY=4iN2B7tYBSdCxqgJBbWy78gUg6w@mail.gmail.com>
References: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>
 <CANnYi3QX_zyOxMVOvu+VMY=4iN2B7tYBSdCxqgJBbWy78gUg6w@mail.gmail.com>
Message-ID: <CACDxx9i74=SSL-jqfOOn1c5b92HspLBSsmkD9ovK+nvN8eXaPA@mail.gmail.com>

And it seems that the piece of traceback refer to NumPy.

On Wed, 6 Jan 2021 at 12:48, Andrew Howe <ahowe42 at gmail.com> wrote:

> A core dump generally happens when a process tries to access memory
> outside it's allocated address space. You've not specified what estimator
> you were using, but I'd guess it attempted to do something with the dataset
> that resulted in it being duplicated or otherwise expanded beyond the
> memory capacity. Perhaps the full stack trace would be helpful.
>
> Andrew
>
>
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> J. Andrew Howe, PhD
> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
> Open Researcher and Contributor ID (ORCID)
> <http://orcid.org/0000-0002-3553-1990>
> Github Profile <http://github.com/ahowe42>
> Personal Website <http://www.andrewhowe.com>
> I live to learn, so I can learn to live. - me
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>
>
> On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1950 at gmail.com> wrote:
>
>> Hi all,
>>
>> I'm using a medium dataset KDD99  IDS(
>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>> for model training, and the dataset has 2 million  samples.  When using
>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>> dumped core. Stack trace
>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>
>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>> unlimited.  Such crash can be reproduced.
>>
>> Thanks.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210106/e4dfa6c8/attachment.html>

From icefrog1950 at gmail.com  Fri Jan  8 00:33:34 2021
From: icefrog1950 at gmail.com (Liu James)
Date: Fri, 8 Jan 2021 13:33:34 +0800
Subject: [scikit-learn] 2 million samples dataset caused python and OS
 crash
In-Reply-To: <CACDxx9i74=SSL-jqfOOn1c5b92HspLBSsmkD9ovK+nvN8eXaPA@mail.gmail.com>
References: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>
 <CANnYi3QX_zyOxMVOvu+VMY=4iN2B7tYBSdCxqgJBbWy78gUg6w@mail.gmail.com>
 <CACDxx9i74=SSL-jqfOOn1c5b92HspLBSsmkD9ovK+nvN8eXaPA@mail.gmail.com>
Message-ID: <CAOnrbHT9vOVekWWq4C8M=QTn0yY3urif9hL2s=BU=2zR2FL8Ow@mail.gmail.com>

Thanks for reply. I tested different size of data on different  distros
,and found when data is over 500 thousand rows (with 50 columns), the crash
will happened with same error message -- kernel page error.

Guillaume Lema?tre <g.lemaitre58 at gmail.com> ?2021?1?6??? ??10:33???

> And it seems that the piece of traceback refer to NumPy.
>
> On Wed, 6 Jan 2021 at 12:48, Andrew Howe <ahowe42 at gmail.com> wrote:
>
>> A core dump generally happens when a process tries to access memory
>> outside it's allocated address space. You've not specified what estimator
>> you were using, but I'd guess it attempted to do something with the dataset
>> that resulted in it being duplicated or otherwise expanded beyond the
>> memory capacity. Perhaps the full stack trace would be helpful.
>>
>> Andrew
>>
>>
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>> J. Andrew Howe, PhD
>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>> Open Researcher and Contributor ID (ORCID)
>> <http://orcid.org/0000-0002-3553-1990>
>> Github Profile <http://github.com/ahowe42>
>> Personal Website <http://www.andrewhowe.com>
>> I live to learn, so I can learn to live. - me
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>
>>
>> On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1950 at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm using a medium dataset KDD99  IDS(
>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>>> for model training, and the dataset has 2 million  samples.  When using
>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>>> dumped core. Stack trace
>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>>
>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>>> unlimited.  Such crash can be reproduced.
>>>
>>> Thanks.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210108/52c99889/attachment.html>

From ahowe42 at gmail.com  Fri Jan  8 04:09:54 2021
From: ahowe42 at gmail.com (Andrew Howe)
Date: Fri, 8 Jan 2021 09:09:54 +0000
Subject: [scikit-learn] 2 million samples dataset caused python and OS
 crash
In-Reply-To: <CAOnrbHT9vOVekWWq4C8M=QTn0yY3urif9hL2s=BU=2zR2FL8Ow@mail.gmail.com>
References: <CAOnrbHTKdGQQ=x8937AnLyQy-4db8gLKq9MfBw3QsVFQoK1LJQ@mail.gmail.com>
 <CANnYi3QX_zyOxMVOvu+VMY=4iN2B7tYBSdCxqgJBbWy78gUg6w@mail.gmail.com>
 <CACDxx9i74=SSL-jqfOOn1c5b92HspLBSsmkD9ovK+nvN8eXaPA@mail.gmail.com>
 <CAOnrbHT9vOVekWWq4C8M=QTn0yY3urif9hL2s=BU=2zR2FL8Ow@mail.gmail.com>
Message-ID: <CANnYi3QeQL956=oazyOYyqdjGX0T1UvPBzOpgwG2s92EqcPuRw@mail.gmail.com>

Doesn't seem like a sklearn issue, but an OS / hardware issue. Again, a
full stack trace would be useful information. Either way, you can try
training on a sample or via cross-validation. I believe some estimators can
also use incremental training.

Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>


On Fri, Jan 8, 2021 at 5:35 AM Liu James <icefrog1950 at gmail.com> wrote:

> Thanks for reply. I tested different size of data on different  distros
> ,and found when data is over 500 thousand rows (with 50 columns), the crash
> will happened with same error message -- kernel page error.
>
> Guillaume Lema?tre <g.lemaitre58 at gmail.com> ?2021?1?6??? ??10:33???
>
>> And it seems that the piece of traceback refer to NumPy.
>>
>> On Wed, 6 Jan 2021 at 12:48, Andrew Howe <ahowe42 at gmail.com> wrote:
>>
>>> A core dump generally happens when a process tries to access memory
>>> outside it's allocated address space. You've not specified what estimator
>>> you were using, but I'd guess it attempted to do something with the dataset
>>> that resulted in it being duplicated or otherwise expanded beyond the
>>> memory capacity. Perhaps the full stack trace would be helpful.
>>>
>>> Andrew
>>>
>>>
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>> J. Andrew Howe, PhD
>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>> Open Researcher and Contributor ID (ORCID)
>>> <http://orcid.org/0000-0002-3553-1990>
>>> Github Profile <http://github.com/ahowe42>
>>> Personal Website <http://www.andrewhowe.com>
>>> I live to learn, so I can learn to live. - me
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>
>>>
>>> On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1950 at gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm using a medium dataset KDD99  IDS(
>>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>>>> for model training, and the dataset has 2 million  samples.  When using
>>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>>>> dumped core. Stack trace
>>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>>>
>>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>>>> unlimited.  Such crash can be reproduced.
>>>>
>>>> Thanks.
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210108/16febb6b/attachment-0001.html>

From reshama.stat at gmail.com  Thu Jan 14 07:57:40 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Thu, 14 Jan 2021 07:57:40 -0500
Subject: [scikit-learn] [Data Umbrella AFME sprint] share
Message-ID: <CAKPCsuiZGO+-1j20+W_L3exppvAQENOW5UMtFiEWr6UXnTmN=Q@mail.gmail.com>

Hello,
There is an upcoming scikit-learn open source sprint to increase
participation of folks in the **Africa and Middle East** regions.

If you are located in **Africa and Middle East**, or have contacts there,
please share:

Data Umbrella has organized a scikit-learn open source sprint for
06-Feb-2021, with a focus on **Africa and Middle East** regions.  A sprint
is a 4-hour online hackathon where data scientists / developers will work
with a pair programming partner on a beginner-friendly issue in the
scikit-learn repo.  Some knowledge of python, scikit-learn and machine
learning is required.  This sprint is an excellent opportunity to increase
machine learning and python skills, get mentorship from core developers of
the library and get started in contributing to open source.

Full details are available here:  https://afme2021.dataumbrella.org

Also, here are social media links that can be shared:
- Twitter [a]
- LinkedIn [b]
- Facebook [c]

[a] https://twitter.com/DataUmbrella/status/1346486322958131202
[b]
https://www.linkedin.com/feed/update/urn:li:activity:6752255120714579968/
[c]
https://www.facebook.com/data.umbrella.dei/photos/a.156775909179975/432596991597864/

Application deadline:  22-January-2021

We are happy to answer any questions.  They can be sent to:
data.umbrella.dei at gmail.com

Best,
Reshama
---
Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210114/47a4d507/attachment.html>

From matematica.a3k at gmail.com  Fri Jan 15 19:41:50 2021
From: matematica.a3k at gmail.com (=?UTF-8?Q?Matem=C3=A1tica_A3K?=)
Date: Fri, 15 Jan 2021 19:41:50 -0500
Subject: [scikit-learn] [ANN] The covid-ht project
Message-ID: <CA+FDnhJRFqwYMugg=+jYTkhv1Ok7LgZE2BLRydqkbtZExmmiQg@mail.gmail.com>

>From https://covid-ht.herokuapp.com/about:

According to Dr. Eugenia Barrientos[1], an ongoing viral infection can be
detected from the results of an hemogram test, and, given the current
COVID19 pandemic, all viral infections with cold and flu symptoms should be
treated as COVID19 cases.

The inference from the hemogram test results is done based on the knowledge
and experience of the Health Professional.

If that process could be automatized and widely available, the detection
toolkit of Health Professionals will be improved.

In many places (i.e. Per?) where specific COVID19 testing is not widely
available - saturated hospitals, not affordable or unavailable - hemogram
blood testing is in the opposite: affordable and in widely distributed
facilities.

If a viral infection classifier with an adequate accuracy through hemograms
can be built and made publicly available, all Health Professionals with a
smart-phone and Internet access could classify any hemogram with the same
accuracy as top-level experts on the matter.

Early detection is deemed to be the greatest success factor in COVID19
treatments.

This project aims to provide a tool to efficiently build and manage that
classifier and make it effectively available for widespread use in order to
improve detection and increase the use efficiency of specific testing of
COVID19.

This tool is totally transparent: you may audit it entirely to fully
understand how it works, what it provides and its limitations. It is
distributed under the GNU LGPLv3 license.

Improvements in early detection should increase successful treatments,
potentially saving lives.

Better resource efficiency can also be achieved with the tool, i.e. only
use expensive specific COVID19 testing for recovery after the hemogram does
not indicate infection.

The tool is not a replacement of Health Professionals.

Any diagnostic and treatment should be decided by a Health Professional
with the patient. If you are an individual with a recent hemogram result,
the tool may indicate to take preemptive care and seek a Health
Professional.

Also, don't blame the knife providers: This program is distributed in the
hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.

Everybody is welcome to join the community for building it and use it:
covid-ht+subscribe at googlegroups.com and https://github.com/math-a3k/covid-ht
.

Made with love for all humans of the world.

[1] https://youtu.be/ZO6EaAz465Y?t=570
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210115/9555534a/attachment.html>

From jean-marc.mercier at mpg-partners.com  Sat Jan 16 08:01:44 2021
From: jean-marc.mercier at mpg-partners.com (Jean-Marc MERCIER)
Date: Sat, 16 Jan 2021 14:01:44 +0100
Subject: [scikit-learn] An alternative project to scikit-learn for support
 vector machine learning tools ?
Message-ID: <CAMHTUN6r_ky8MWgFNA7YSvxBXFjk6jocwLAuWk5SR=FVNZfv0Q@mail.gmail.com>

Hello,

and congratulations for the very nice work done at scikit-learn
<https://scikit-learn.org/stable/> !

I would like to point out an alternative initiative to SVM tools for
machine learning than scikit-learn. We are trying to kick-off
<https://www.linkedin.com/pulse/codpy-tutorial-mercier-jean-marc/> it, see
for instance this link here
<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3766451>.

Indeed, as practitioners from the private research sector, we felt the need
to craft an alternative approach to SVM learning tools some years ago. We
are using this approach today for industrial applications, it revealed
itself quite solid and robust.

I thought that this initiative might interest the scikit-learn community.
Above curiosity, there might be some interests to discuss together : might
our ideas be interesting for your community ? Can they be merged together ?

So I would like to identify the correct persons at scikit-learn to discuss
these matters. Could someone help me there to identify who is in charge of
this project to enter a dialog with him ?

-- 

*Jean-Marc Mercier*

Senior Research Advisor

136 boulevard Haussmann

75008 Paris

Tel +33 1 53 05 98 52

GSM +33 6 77 64 06 85

www.mpg-partners.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210116/ebb1164c/attachment.html>

From adrin.jalali at gmail.com  Sat Jan 16 12:44:09 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Sat, 16 Jan 2021 18:44:09 +0100
Subject: [scikit-learn] Renaming the default branch to `main`
Message-ID: <CAEOrW4-bm5V_Vb4qh1Ki2f86Z9evoK9OzpVmFOQFV22LXVvCkQ@mail.gmail.com>

GitHub now supports renaming <https://github.com/github/renaming> the
default branch with this done automatically:

Renaming a branch will:

   - Re-target any open pull requests
   - Update any draft releases based on the branch
   - Move any branch protection rules that explicitly reference the old name
   - Update the branch used to build GitHub Pages, if applicable
   - Show a notice to repository contributors, maintainers, and admins on
   the repository homepage with instructions to update local copies of the
   repository
   - Show a notice to contributors who git push to the old branch
   - Redirect web requests for the old branch name to the new branch name
   - Return a "Moved Permanently" response in API requests for the old
   branch name


We have talked in this issue
<https://github.com/scikit-learn/scikit-learn/issues/17595> about renaming
the branch, but since this is a major change, hence this email to engage
and inform the broader community.

Cheers,
Adrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210116/bdd5debe/attachment.html>

From g.lemaitre58 at gmail.com  Tue Jan 19 13:16:55 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Tue, 19 Jan 2021 19:16:55 +0100
Subject: [scikit-learn] [ANN] scikit-learn 0.24.1 is online!
Message-ID: <CACDxx9iL5DZzPDEg2sWD-yKLtFQLPmTvur3qSBDpxZQJyL9rWw@mail.gmail.com>

scikit-learn 0.24.1 is out on pypi.org and conda-forge!

This is a small maintenance release that fixes the macOS wheels and small
bugs in SelfTrainingClassifier and adjusted_mutual_info_score:

https://scikit-learn.org/stable/whats_new/v0.24.html#version-0-24-1

You can upgrade with pip as usual:

pip install -U scikit-learn

The conda-forge builds will be available shortly, which you can then
install using:

conda install -c conda-forge scikit-learn


Thanks again to all the contributors!
On behalf of the scikit-learn maintainer team.
-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210119/747689e5/attachment.html>

From bertrand25mtl at gmail.com  Wed Jan 20 17:19:40 2021
From: bertrand25mtl at gmail.com (Bertrand B.)
Date: Wed, 20 Jan 2021 17:19:40 -0500
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
Message-ID: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>

To whom it may concern,

I am trying to install scikit-learn in a PySpark job using the
install_pypi_package PySpark API but the install fails with :

sc.install_pypi_package("scikit-learn")

Collecting scikit-learn
  Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
Requirement already satisfied: numpy>=1.13.3 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn)
Collecting scipy>=0.19.1 (from scikit-learn)
  Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
Requirement already satisfied: joblib>=0.11 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Building wheels for collected packages: scikit-learn
  Running setup.py bdist_wheel for scikit-learn: started
  Running setup.py bdist_wheel for scikit-learn: finished with status 'error'
  Complete output from command /tmp/1611000009300-0/bin/python -u -c
"import setuptools,
tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py';f=getattr(tokenize,
'open', open)(__file__);code=f.read().replace('\r\n',
'\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d
/tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
  Partial import of sklearn during the build process.
  Traceback (most recent call last):
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line
201, in check_package_status
      module = importlib.import_module(package)
    File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py",
line 127, in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
    File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
    File "<frozen importlib._bootstrap>", line 983, in _find_and_load
    File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
  ModuleNotFoundError: No module named 'scipy'
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line
306, in <module>
      setup_package()
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line
294, in setup_package
      check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py", line
227, in check_package_status
      .format(package, req_str, instructions))
  ImportError: scipy is not installed.
  scikit-learn requires scipy >= 0.19.1.

I do not encounter this error with scikit-learn 0.23.2 :

sc.install_pypi_package("scikit-learn==0.23.2")

Collecting scikit-learn==0.23.2
  Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
Requirement already satisfied: scipy>=0.19.1 in
/mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from
scikit-learn==0.23.2)
Requirement already satisfied: numpy>=1.13.3 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
Requirement already satisfied: joblib>=0.11 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in
/mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from
scikit-learn==0.23.2)
Installing collected packages: scikit-learn
Successfully installed scikit-learn-0.23.2


Could you please help me understand why the scikit-learn 0.24 installation
fails ?

Thank you for your help,

Bertrand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210120/c901d266/attachment-0001.html>

From g.lemaitre58 at gmail.com  Wed Jan 20 18:16:05 2021
From: g.lemaitre58 at gmail.com (=?ISO-8859-1?Q?Guillaume_Lema=EEtre?=)
Date: Thu, 21 Jan 2021 00:16:05 +0100
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
Message-ID: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210121/82685aab/attachment-0001.html>

From helmrp at yahoo.com  Wed Jan 20 18:32:13 2021
From: helmrp at yahoo.com (The Helmbolds)
Date: Wed, 20 Jan 2021 23:32:13 +0000 (UTC)
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
Message-ID: <2130832503.2194460.1611185533518@mail.yahoo.com>

Use the Anaconda Python installation.
"You won't find the right answers if you don't ask the right questions!" (Robert Helmbold, 2013) 

    On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <g.lemaitre58 at gmail.com> wrote:  
 
 #yiv4846675950 #yiv4846675950response_container_BBPPID{font-family:initial;font-size:initial;color:initial;} Basically it get the tar with the source and recompile instead of using the wheel. Could you force an install from PyPI without using the cached file.?
We pushed wheels yesterday for 0.24.1 as well so it should not get the 0.24.0 version.?
For 0.23.2, you can see that it used the wheel (.whl).?  
  
Sent from my phone - sorry to be brief and potential misspell. 
  
|   From: bertrand25mtl at gmail.comSent: 20 January 2021 23:21To: scikit-learn at python.orgReply to: scikit-learn at python.orgSubject: [scikit-learn] scikit-learn 0.24 installation fails with ModuleNotFoundError: No module named 'scipy' |

 
 To whom it may concern,
I am trying to install scikit-learn in a PySpark job using the install_pypi_package PySpark API but the install fails with :?
sc.install_pypi_package("scikit-learn")

Collecting scikit-learn
  Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz 
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
Collecting scipy>=0.19.1 (from scikit-learn)
  Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl 
Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Building wheels for  collected packages: scikit-learn
? Running setup.py bdist_wheelfor  scikit-learn: started
? Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
  Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
  Partial import of sklearn during the build process.
  Traceback (most recent call last):
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
      module = importlib.import_module(package)
    File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
    File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
    File "<frozen importlib._bootstrap>", line 983, in _find_and_load
    File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
  ModuleNotFoundError: No module named 'scipy'
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
      setup_package()
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
      check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
    File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
      .format(package, req_str, instructions))
  ImportError: scipy is not installed.
  scikit-learn requires scipy >= 0.19.1.
I do not encounter this error with scikit-learn 0.23.2 :
sc.install_pypi_package("scikit-learn==0.23.2") 

Collecting scikit-learn==0.23.2
  Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl 
Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
Installing collected packages: scikit-learn
Successfully installed scikit-learn-0.23.2?
Could you please help me understand why the scikit-learn 0.24 installation fails ?
Thank you for your help,
Bertrand_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210120/65c0b3ba/attachment-0001.html>

From marmochiaskl at gmail.com  Thu Jan 21 03:24:37 2021
From: marmochiaskl at gmail.com (Chiara Marmo)
Date: Thu, 21 Jan 2021 09:24:37 +0100
Subject: [scikit-learn] Monthly meeting January 25th 2021
Message-ID: <CAGfF149yTWL2CVwzTvsGdM2HN8Pe1W-=t4-uOmsGUPSFNnLADA@mail.gmail.com>

Dear list,

The scikit-learn monthly meeting will take place on Monday January 25th at
8PM UTC:
<https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=04&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224>
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=01&day=25&hour=20&min=0&sec=0&p1=179&p2=240&p3=195&p4=224
<https://www.google.com/url?q=https://www.timeanddate.com/worldclock/meetingtime.html?month%3D10%26day%3D26%26year%3D2020%26p1%3D179%26p2%3D240%26p3%3D195%26p4%3D224%26iv%3D0&sa=D&source=calendar&usd=2&usg=AOvVaw1nSuFYBdv9nvJzi_QftPp->

While these meetings are mainly for core-devs to discuss the current
topics, we are also happy to welcome non-core devs and other project
maintainers.

Feel free to join, using the following link:

https://meet.google.com/xhq-yoga-rtf

If you plan to attend and you would like to discuss something specific
about your contribution please add your name (or github pseudo) in the "
Contributors <https://hackmd.io/qVZD8baKRce3uYpto11z0w#Contributors>"
section, of the public pad:

<https://hackmd.io/qtKt7pTNSXanU-MJOIMxbw>
https://hackmd.io/qVZD8baKRce3uYpto11z0w

Best

Chiara
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210121/166c8afd/attachment.html>

From g.lemaitre58 at gmail.com  Fri Jan 22 03:49:08 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 22 Jan 2021 09:49:08 +0100
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <2130832503.2194460.1611185533518@mail.yahoo.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
 <2130832503.2194460.1611185533518@mail.yahoo.com>
Message-ID: <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>

We might experience an issue with PyPI not selecting the manylinux2010
wheel: https://github.com/scikit-learn/scikit-learn/issues/19233
We have to check but we will probably shortly upload manylinux1 wheels that
should resolve the issue.

I am curious if fetching the wheel by hand and installing via `pip` would
be a workaround (not practical for automated usage thought).

On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn <
scikit-learn at python.org> wrote:

> Use the Anaconda Python installation.
>
> "You won't find the right answers if you don't ask the right questions!"
> (Robert Helmbold, 2013)
>
>
> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <
> g.lemaitre58 at gmail.com> wrote:
>
>
> Basically it get the tar with the source and recompile instead of using
> the wheel. Could you force an install from PyPI without using the cached
> file.
>
> We pushed wheels yesterday for 0.24.1 as well so it should not get the
> 0.24.0 version.
>
> For 0.23.2, you can see that it used the wheel (.whl).
>
> Sent from my phone - sorry to be brief and potential misspell.
> *From:* bertrand25mtl at gmail.com
> *Sent:* 20 January 2021 23:21
> *To:* scikit-learn at python.org
> *Reply to:* scikit-learn at python.org
> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with
> ModuleNotFoundError: No module named 'scipy'
>
> To whom it may concern,
>
> I am trying to install scikit-learn in a PySpark job using the
> install_pypi_package PySpark API but the install fails with :
>
> sc.install_pypi_package("scikit-learn")
>
> Collecting scikit-learn
>   Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
> Collecting scipy>=0.19.1 (from scikit-learn)
>   Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
> Collecting threadpoolctl>=2.0.0 (from scikit-learn)
>   Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
> Building wheels for  collected packages: scikit-learn
>   Running setup.py bdist_wheelfor  scikit-learn: started
>   Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
>   Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
>   Partial import of sklearn during the build process.
>   Traceback (most recent call last):
>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
>       module = importlib.import_module(package)
>     File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
>       return _bootstrap._gcd_import(name[level:], package, level)
>     File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
>     File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>     File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
>   ModuleNotFoundError: No module named 'scipy'
>   Traceback (most recent call last):
>     File "<string>", line 1, in <module>
>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
>       setup_package()
>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
>       check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
>       .format(package, req_str, instructions))
>   ImportError: scipy is not installed.
>   scikit-learn requires scipy >= 0.19.1.
>
> I do not encounter this error with scikit-learn 0.23.2 :
>
> sc.install_pypi_package("scikit-learn==0.23.2")
>
> Collecting scikit-learn==0.23.2
>   Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
> Installing collected packages: scikit-learn
> Successfully installed scikit-learn-0.23.2
>
>
> Could you please help me understand why the scikit-learn 0.24 installation
> fails ?
>
> Thank you for your help,
>
> Bertrand
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/541afb32/attachment-0001.html>

From g.lemaitre58 at gmail.com  Fri Jan 22 04:11:43 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 22 Jan 2021 10:11:43 +0100
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
 <2130832503.2194460.1611185533518@mail.yahoo.com>
 <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>
Message-ID: <CACDxx9iY3U7AtmqdT19MDM0sHw56giM1M8PPB0Jr5U1QxzVkhg@mail.gmail.com>

@Bertrand Could you tell us which version of `pip` to you use (you need pip
>= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014)

On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

> We might experience an issue with PyPI not selecting the manylinux2010
> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233
> We have to check but we will probably shortly upload manylinux1 wheels
> that should resolve the issue.
>
> I am curious if fetching the wheel by hand and installing via `pip` would
> be a workaround (not practical for automated usage thought).
>
> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn <
> scikit-learn at python.org> wrote:
>
>> Use the Anaconda Python installation.
>>
>> "You won't find the right answers if you don't ask the right questions!"
>> (Robert Helmbold, 2013)
>>
>>
>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <
>> g.lemaitre58 at gmail.com> wrote:
>>
>>
>> Basically it get the tar with the source and recompile instead of using
>> the wheel. Could you force an install from PyPI without using the cached
>> file.
>>
>> We pushed wheels yesterday for 0.24.1 as well so it should not get the
>> 0.24.0 version.
>>
>> For 0.23.2, you can see that it used the wheel (.whl).
>>
>> Sent from my phone - sorry to be brief and potential misspell.
>> *From:* bertrand25mtl at gmail.com
>> *Sent:* 20 January 2021 23:21
>> *To:* scikit-learn at python.org
>> *Reply to:* scikit-learn at python.org
>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with
>> ModuleNotFoundError: No module named 'scipy'
>>
>> To whom it may concern,
>>
>> I am trying to install scikit-learn in a PySpark job using the
>> install_pypi_package PySpark API but the install fails with :
>>
>> sc.install_pypi_package("scikit-learn")
>>
>> Collecting scikit-learn
>>   Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>> Collecting scipy>=0.19.1 (from scikit-learn)
>>   Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>> Collecting threadpoolctl>=2.0.0 (from scikit-learn)
>>   Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
>> Building wheels for  collected packages: scikit-learn
>>   Running setup.py bdist_wheelfor  scikit-learn: started
>>   Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
>>   Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
>>   Partial import of sklearn during the build process.
>>   Traceback (most recent call last):
>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
>>       module = importlib.import_module(package)
>>     File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
>>       return _bootstrap._gcd_import(name[level:], package, level)
>>     File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
>>     File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>>     File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
>>   ModuleNotFoundError: No module named 'scipy'
>>   Traceback (most recent call last):
>>     File "<string>", line 1, in <module>
>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
>>       setup_package()
>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
>>       check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
>>       .format(package, req_str, instructions))
>>   ImportError: scipy is not installed.
>>   scikit-learn requires scipy >= 0.19.1.
>>
>> I do not encounter this error with scikit-learn 0.23.2 :
>>
>> sc.install_pypi_package("scikit-learn==0.23.2")
>>
>> Collecting scikit-learn==0.23.2
>>   Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>> Installing collected packages: scikit-learn
>> Successfully installed scikit-learn-0.23.2
>>
>>
>> Could you please help me understand why the scikit-learn 0.24
>> installation fails ?
>>
>> Thank you for your help,
>>
>> Bertrand
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/fe02a655/attachment-0001.html>

From mahmood.nt at gmail.com  Fri Jan 22 04:13:15 2021
From: mahmood.nt at gmail.com (Mahmood Naderan)
Date: Fri, 22 Jan 2021 10:13:15 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
Message-ID: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>

Hi
I have a question about PCA and that is, how we can determine, a
variable, X,  is better captured by which factor (principal
component)? For example, maybe one variable has low weight in the
first PC but has a higher weight in the fifth PC.

When I use the PCA from Scikit, I have to manually work with the PCs,
therefore, I may miss the point that although a variable is weak in
PC1-PC2 plot, it may be strong in PC4-PC5 plot.

Any comment on that?

Regards,
Mahmood

From g.lemaitre58 at gmail.com  Fri Jan 22 04:25:54 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 22 Jan 2021 10:25:54 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
References: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
Message-ID: <CACDxx9hf1Cq45MXh=9uG_X598ZBVXLA-_6p3Ppg080R0TBkEnw@mail.gmail.com>

I am not really understanding the question, sorry.
Are you seeking for the `explained_variance_ratio_` attribute that give you
a relative value of the eigenvalues associated to the eigenvectors?

On Fri, 22 Jan 2021 at 10:16, Mahmood Naderan <mahmood.nt at gmail.com> wrote:

> Hi
> I have a question about PCA and that is, how we can determine, a
> variable, X,  is better captured by which factor (principal
> component)? For example, maybe one variable has low weight in the
> first PC but has a higher weight in the fifth PC.
>
> When I use the PCA from Scikit, I have to manually work with the PCs,
> therefore, I may miss the point that although a variable is weak in
> PC1-PC2 plot, it may be strong in PC4-PC5 plot.
>
> Any comment on that?
>
> Regards,
> Mahmood
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/e6722d5f/attachment.html>

From julio at esbet.es  Fri Jan 22 05:17:22 2021
From: julio at esbet.es (Julio Antonio Soto)
Date: Fri, 22 Jan 2021 11:17:22 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <CACDxx9hf1Cq45MXh=9uG_X598ZBVXLA-_6p3Ppg080R0TBkEnw@mail.gmail.com>
References: <CACDxx9hf1Cq45MXh=9uG_X598ZBVXLA-_6p3Ppg080R0TBkEnw@mail.gmail.com>
Message-ID: <BA01D2FE-9A32-47A6-8408-11197F6A6087@esbet.es>

Hi Mahmood,

I believe your question is answered here: https://stackoverflow.com/questions/22984335/recovering-features-names-of-explained-variance-ratio-in-pca-with-sklearn


> El 22 ene 2021, a las 10:26, Guillaume Lema?tre <g.lemaitre58 at gmail.com> escribi?:
> 
> ?
> I am not really understanding the question, sorry.
> Are you seeking for the `explained_variance_ratio_` attribute that give you a relative value of the eigenvalues associated to the eigenvectors?
> 
>> On Fri, 22 Jan 2021 at 10:16, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
>> Hi
>> I have a question about PCA and that is, how we can determine, a
>> variable, X,  is better captured by which factor (principal
>> component)? For example, maybe one variable has low weight in the
>> first PC but has a higher weight in the fifth PC.
>> 
>> When I use the PCA from Scikit, I have to manually work with the PCs,
>> therefore, I may miss the point that although a variable is weak in
>> PC1-PC2 plot, it may be strong in PC4-PC5 plot.
>> 
>> Any comment on that?
>> 
>> Regards,
>> Mahmood
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> -- 
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/68e5326f/attachment.html>

From niourf at gmail.com  Fri Jan 22 05:50:30 2021
From: niourf at gmail.com (Nicolas Hug)
Date: Fri, 22 Jan 2021 10:50:30 +0000
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
References: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
Message-ID: <f82e57b1-a5dd-a055-f50d-fe6e1e573476@gmail.com>

Hi Mahmood,

There are different pieces of info that you can get from PCA:

1. How important is a given PC to reconstruct the entire dataset -> This 
is given by explained_variance_ratio_ as Guillaume suggested

2. What is the contribution of each feature to each PC (remember that a 
PC is a linear combination of all the features i.e.: PC_1 = X_1 . 
alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what 
you're looking for and they are given in the components_ matrix which is 
a n_components x n_features matrix.

Nicolas

On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> Hi
> I have a question about PCA and that is, how we can determine, a
> variable, X,  is better captured by which factor (principal
> component)? For example, maybe one variable has low weight in the
> first PC but has a higher weight in the fifth PC.
>
> When I use the PCA from Scikit, I have to manually work with the PCs,
> therefore, I may miss the point that although a variable is weak in
> PC1-PC2 plot, it may be strong in PC4-PC5 plot.
>
> Any comment on that?
>
> Regards,
> Mahmood
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From bertrand25mtl at gmail.com  Fri Jan 22 08:37:21 2021
From: bertrand25mtl at gmail.com (Bertrand B.)
Date: Fri, 22 Jan 2021 08:37:21 -0500
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <CACDxx9iY3U7AtmqdT19MDM0sHw56giM1M8PPB0Jr5U1QxzVkhg@mail.gmail.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
 <2130832503.2194460.1611185533518@mail.yahoo.com>
 <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>
 <CACDxx9iY3U7AtmqdT19MDM0sHw56giM1M8PPB0Jr5U1QxzVkhg@mail.gmail.com>
Message-ID: <CADYAjuc-hD5JoiBVBCVEj53Pe_eTbvaudOQV35WsibM7hUu9uw@mail.gmail.com>

Thank you Guillaume for your help,

I am using : (running on AWS EMR-6.2)
pip3 --version
pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7)


pip3 install scikit-learn

Collecting scikit-learn
  Using cached
https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz
Requirement already satisfied: numpy>=1.13.3 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn)
Collecting scipy>=0.19.1 (from scikit-learn)
  Using cached
https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
Requirement already satisfied: joblib>=0.11 in
/usr/local/lib64/python3.7/site-packages (from scikit-learn)
Requirement already satisfied: threadpoolctl>=2.0.0 in
/usr/local/lib/python3.7/site-packages (from scikit-learn)
Installing collected packages: scipy, scikit-learn
  Running setup.py install for scikit-learn ... error
    Complete output from command /usr/bin/python3 -u -c "import setuptools,
tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize,
'open', open)(__file__);code=f.read().replace('\r\n',
'\n');f.close();exec(compile(code, __file__, 'exec'))" install --record
/tmp/pip-0ulalx36-record/install-record.txt
--single-version-externally-managed --compile:
    Partial import of sklearn during the build process.
    Traceback (most recent call last):
      File
"/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py",
line 27, in _check_cython_version
        import Cython
    ModuleNotFoundError: No module named 'Cython'


Upgrading pip to 20.3.3 :

sudo pip3 install --upgrade pip
sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3

pip3 --version
pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

let me install from the whl file :
pip3 install scikit-learn
Collecting scikit-learn
  Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl (22.3
MB)

However, using the API sc.install_pypi_package("scikit-learn") still uses
the tar file instead of the whl file (even after the pip upgrade).

Collecting scikit-learn
  Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz


Thanks for your help,

Cheers,

Bertrand

Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
a ?crit :

> @Bertrand Could you tell us which version of `pip` to you use (you need
> pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014)
>
> On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> wrote:
>
>> We might experience an issue with PyPI not selecting the manylinux2010
>> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233
>> We have to check but we will probably shortly upload manylinux1 wheels
>> that should resolve the issue.
>>
>> I am curious if fetching the wheel by hand and installing via `pip` would
>> be a workaround (not practical for automated usage thought).
>>
>> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn <
>> scikit-learn at python.org> wrote:
>>
>>> Use the Anaconda Python installation.
>>>
>>> "You won't find the right answers if you don't ask the right questions!"
>>> (Robert Helmbold, 2013)
>>>
>>>
>>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <
>>> g.lemaitre58 at gmail.com> wrote:
>>>
>>>
>>> Basically it get the tar with the source and recompile instead of using
>>> the wheel. Could you force an install from PyPI without using the cached
>>> file.
>>>
>>> We pushed wheels yesterday for 0.24.1 as well so it should not get the
>>> 0.24.0 version.
>>>
>>> For 0.23.2, you can see that it used the wheel (.whl).
>>>
>>> Sent from my phone - sorry to be brief and potential misspell.
>>> *From:* bertrand25mtl at gmail.com
>>> *Sent:* 20 January 2021 23:21
>>> *To:* scikit-learn at python.org
>>> *Reply to:* scikit-learn at python.org
>>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with
>>> ModuleNotFoundError: No module named 'scipy'
>>>
>>> To whom it may concern,
>>>
>>> I am trying to install scikit-learn in a PySpark job using the
>>> install_pypi_package PySpark API but the install fails with :
>>>
>>> sc.install_pypi_package("scikit-learn")
>>>
>>> Collecting scikit-learn
>>>   Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>> Collecting scipy>=0.19.1 (from scikit-learn)
>>>   Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>> Collecting threadpoolctl>=2.0.0 (from scikit-learn)
>>>   Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
>>> Building wheels for  collected packages: scikit-learn
>>>   Running setup.py bdist_wheelfor  scikit-learn: started
>>>   Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
>>>   Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
>>>   Partial import of sklearn during the build process.
>>>   Traceback (most recent call last):
>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
>>>       module = importlib.import_module(package)
>>>     File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
>>>       return _bootstrap._gcd_import(name[level:], package, level)
>>>     File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
>>>     File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>>>     File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
>>>   ModuleNotFoundError: No module named 'scipy'
>>>   Traceback (most recent call last):
>>>     File "<string>", line 1, in <module>
>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
>>>       setup_package()
>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
>>>       check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
>>>       .format(package, req_str, instructions))
>>>   ImportError: scipy is not installed.
>>>   scikit-learn requires scipy >= 0.19.1.
>>>
>>> I do not encounter this error with scikit-learn 0.23.2 :
>>>
>>> sc.install_pypi_package("scikit-learn==0.23.2")
>>>
>>> Collecting scikit-learn==0.23.2
>>>   Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
>>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>> Installing collected packages: scikit-learn
>>> Successfully installed scikit-learn-0.23.2
>>>
>>>
>>> Could you please help me understand why the scikit-learn 0.24
>>> installation fails ?
>>>
>>> Thank you for your help,
>>>
>>> Bertrand
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/37360424/attachment-0001.html>

From g.lemaitre58 at gmail.com  Fri Jan 22 09:04:32 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Fri, 22 Jan 2021 15:04:32 +0100
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <CADYAjuc-hD5JoiBVBCVEj53Pe_eTbvaudOQV35WsibM7hUu9uw@mail.gmail.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
 <2130832503.2194460.1611185533518@mail.yahoo.com>
 <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>
 <CACDxx9iY3U7AtmqdT19MDM0sHw56giM1M8PPB0Jr5U1QxzVkhg@mail.gmail.com>
 <CADYAjuc-hD5JoiBVBCVEj53Pe_eTbvaudOQV35WsibM7hUu9uw@mail.gmail.com>
Message-ID: <CACDxx9g8tdH9tDnLPjwa7OFqYg5VcP+7-TByOb4cGk5fpKA3NQ@mail.gmail.com>

OK, so the normal install is working. Now, to fix your issue we need to
understand how `sc.install_pypi_package` is working and mainly how does it
call `pip`. We need to make sure that it call the right pip (the system
`pip3` in your case).


On Fri, 22 Jan 2021 at 14:39, Bertrand B. <bertrand25mtl at gmail.com> wrote:

> Thank you Guillaume for your help,
>
> I am using : (running on AWS EMR-6.2)
> pip3 --version
> pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7)
>
>
> pip3 install scikit-learn
>
> Collecting scikit-learn
>   Using cached
> https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz
> Requirement already satisfied: numpy>=1.13.3 in
> /usr/local/lib64/python3.7/site-packages (from scikit-learn)
> Collecting scipy>=0.19.1 (from scikit-learn)
>   Using cached
> https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
> Requirement already satisfied: joblib>=0.11 in
> /usr/local/lib64/python3.7/site-packages (from scikit-learn)
> Requirement already satisfied: threadpoolctl>=2.0.0 in
> /usr/local/lib/python3.7/site-packages (from scikit-learn)
> Installing collected packages: scipy, scikit-learn
>   Running setup.py install for scikit-learn ... error
>     Complete output from command /usr/bin/python3 -u -c "import
> setuptools,
> tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize,
> 'open', open)(__file__);code=f.read().replace('\r\n',
> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record
> /tmp/pip-0ulalx36-record/install-record.txt
> --single-version-externally-managed --compile:
>     Partial import of sklearn during the build process.
>     Traceback (most recent call last):
>       File
> "/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py",
> line 27, in _check_cython_version
>         import Cython
>     ModuleNotFoundError: No module named 'Cython'
>
>
> Upgrading pip to 20.3.3 :
>
> sudo pip3 install --upgrade pip
> sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3
>
> pip3 --version
> pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
>
> let me install from the whl file :
> pip3 install scikit-learn
> Collecting scikit-learn
>   Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl
> (22.3 MB)
>
> However, using the API sc.install_pypi_package("scikit-learn") still uses
> the tar file instead of the whl file (even after the pip upgrade).
>
> Collecting scikit-learn
>   Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz
>
>
> Thanks for your help,
>
> Cheers,
>
> Bertrand
>
> Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> a ?crit :
>
>> @Bertrand Could you tell us which version of `pip` to you use (you need
>> pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014)
>>
>> On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
>> wrote:
>>
>>> We might experience an issue with PyPI not selecting the manylinux2010
>>> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233
>>> We have to check but we will probably shortly upload manylinux1 wheels
>>> that should resolve the issue.
>>>
>>> I am curious if fetching the wheel by hand and installing via `pip`
>>> would be a workaround (not practical for automated usage thought).
>>>
>>> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn <
>>> scikit-learn at python.org> wrote:
>>>
>>>> Use the Anaconda Python installation.
>>>>
>>>> "You won't find the right answers if you don't ask the right
>>>> questions!" (Robert Helmbold, 2013)
>>>>
>>>>
>>>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <
>>>> g.lemaitre58 at gmail.com> wrote:
>>>>
>>>>
>>>> Basically it get the tar with the source and recompile instead of using
>>>> the wheel. Could you force an install from PyPI without using the cached
>>>> file.
>>>>
>>>> We pushed wheels yesterday for 0.24.1 as well so it should not get the
>>>> 0.24.0 version.
>>>>
>>>> For 0.23.2, you can see that it used the wheel (.whl).
>>>>
>>>> Sent from my phone - sorry to be brief and potential misspell.
>>>> *From:* bertrand25mtl at gmail.com
>>>> *Sent:* 20 January 2021 23:21
>>>> *To:* scikit-learn at python.org
>>>> *Reply to:* scikit-learn at python.org
>>>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with
>>>> ModuleNotFoundError: No module named 'scipy'
>>>>
>>>> To whom it may concern,
>>>>
>>>> I am trying to install scikit-learn in a PySpark job using the
>>>> install_pypi_package PySpark API but the install fails with :
>>>>
>>>> sc.install_pypi_package("scikit-learn")
>>>>
>>>> Collecting scikit-learn
>>>>   Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>>> Collecting scipy>=0.19.1 (from scikit-learn)
>>>>   Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>>> Collecting threadpoolctl>=2.0.0 (from scikit-learn)
>>>>   Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
>>>> Building wheels for  collected packages: scikit-learn
>>>>   Running setup.py bdist_wheelfor  scikit-learn: started
>>>>   Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
>>>>   Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
>>>>   Partial import of sklearn during the build process.
>>>>   Traceback (most recent call last):
>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
>>>>       module = importlib.import_module(package)
>>>>     File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
>>>>       return _bootstrap._gcd_import(name[level:], package, level)
>>>>     File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
>>>>     File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>>>>     File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
>>>>   ModuleNotFoundError: No module named 'scipy'
>>>>   Traceback (most recent call last):
>>>>     File "<string>", line 1, in <module>
>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
>>>>       setup_package()
>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
>>>>       check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
>>>>       .format(package, req_str, instructions))
>>>>   ImportError: scipy is not installed.
>>>>   scikit-learn requires scipy >= 0.19.1.
>>>>
>>>> I do not encounter this error with scikit-learn 0.23.2 :
>>>>
>>>> sc.install_pypi_package("scikit-learn==0.23.2")
>>>>
>>>> Collecting scikit-learn==0.23.2
>>>>   Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
>>>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>>> Installing collected packages: scikit-learn
>>>> Successfully installed scikit-learn-0.23.2
>>>>
>>>>
>>>> Could you please help me understand why the scikit-learn 0.24
>>>> installation fails ?
>>>>
>>>> Thank you for your help,
>>>>
>>>> Bertrand
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210122/83d26c99/attachment-0001.html>

From mahmood.nt at gmail.com  Fri Jan 22 15:48:46 2021
From: mahmood.nt at gmail.com (Mahmood Naderan)
Date: Fri, 22 Jan 2021 21:48:46 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <f82e57b1-a5dd-a055-f50d-fe6e1e573476@gmail.com>
References: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
 <f82e57b1-a5dd-a055-f50d-fe6e1e573476@gmail.com>
Message-ID: <CADa2P2WipVEoZhSMGf0bsBBkS-FBa9ZyB8O0EOSHp85N44TG7A@mail.gmail.com>

Hi
Thanks for the replies. I read about the available functions in the
PCA section. Consider the following code

x = StandardScaler().fit_transform(x)
pca = PCA()
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents)
loadings = pca.components_
finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1)
print( "First and second observations\n", finalDf.loc[0:1] )
print( "loadings[0:1]\n", loadings[0], loadings[1] )
print ("explained_variance_ratio_\n",pca.explained_variance_ratio_)


The output looks like

First and second observations
0 1 2 3 4 kernel
0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1
1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2
loadings[0:1]
[0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375
-0.01257726 0.29718078 0.07493325 0.07562934]
explained_variance_ratio_
[7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06]


As you can see for two kernels named ELEC1 and ELEC2, there are five
PCs from 0 to 4.
Now based on the numbers in the loadings, I expect that loadings[0]
which is the first variable is better shown on PC1-PC2 plane
(0.49137412,0.46511098). However, loadings[1] which is the second
variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078).
Is this understanding correct?

I don't understand what explained_variance_ratio_ is trying to say here.


Regards,
Mahmood

On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <niourf at gmail.com> wrote:
>
> Hi Mahmood,
>
> There are different pieces of info that you can get from PCA:
>
> 1. How important is a given PC to reconstruct the entire dataset -> This
> is given by explained_variance_ratio_ as Guillaume suggested
>
> 2. What is the contribution of each feature to each PC (remember that a
> PC is a linear combination of all the features i.e.: PC_1 = X_1 .
> alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what
> you're looking for and they are given in the components_ matrix which is
> a n_components x n_features matrix.
>
> Nicolas
>
> On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> > Hi
> > I have a question about PCA and that is, how we can determine, a
> > variable, X,  is better captured by which factor (principal
> > component)? For example, maybe one variable has low weight in the
> > first PC but has a higher weight in the fifth PC.
> >
> > When I use the PCA from Scikit, I have to manually work with the PCs,
> > therefore, I may miss the point that although a variable is weak in
> > PC1-PC2 plot, it may be strong in PC4-PC5 plot.
> >
> > Any comment on that?
> >
> > Regards,
> > Mahmood
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From bertrand25mtl at gmail.com  Sat Jan 23 11:16:20 2021
From: bertrand25mtl at gmail.com (Bertrand B.)
Date: Sat, 23 Jan 2021 11:16:20 -0500
Subject: [scikit-learn] scikit-learn 0.24 installation fails with
 ModuleNotFoundError: No module named 'scipy'
In-Reply-To: <CACDxx9g8tdH9tDnLPjwa7OFqYg5VcP+7-TByOb4cGk5fpKA3NQ@mail.gmail.com>
References: <CADYAjuewoDWChyR3tDcLFbQvrJ3D2dus627N0KSPMmMhm1EQMw@mail.gmail.com>
 <5m3ju06gkblbocg20jr9uhme.1611184565040@gmail.com>
 <2130832503.2194460.1611185533518@mail.yahoo.com>
 <CACDxx9im3JmaFmVEVjUc67T+Pe-gUtVOLzrdVWdizp9TpESmJg@mail.gmail.com>
 <CACDxx9iY3U7AtmqdT19MDM0sHw56giM1M8PPB0Jr5U1QxzVkhg@mail.gmail.com>
 <CADYAjuc-hD5JoiBVBCVEj53Pe_eTbvaudOQV35WsibM7hUu9uw@mail.gmail.com>
 <CACDxx9g8tdH9tDnLPjwa7OFqYg5VcP+7-TByOb4cGk5fpKA3NQ@mail.gmail.com>
Message-ID: <CADYAjuekQuTMj2C7NkTYQrJ+UFCP0A+w6jARNTw32f_ietrumw@mail.gmail.com>

Thank you Guillaume for your help,

When I start a Spark cluster on AWS, I add a bootstrap step to update pip
and install sklearn so that users no longer have to install scikit-learn in
their job with sc.install_pypi_package.

We are using Spark with sklearn to run hyper-parameter tuning using spark
to run many model configurations in parallel (broadcasting the pandas
dataframe and running independent models on each Spark container). That is
why we need to have scikit learn installed on each worker node. This
technique works very well conditional that the pandas dataframe fits in the
container memory (each spark container will have a copy of the pandas
dataframe).

Thank you for your great work and help,

Cheers,

Bertrand

Le ven. 22 janv. 2021 ? 09:06, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
a ?crit :

> OK, so the normal install is working. Now, to fix your issue we need to
> understand how `sc.install_pypi_package` is working and mainly how does it
> call `pip`. We need to make sure that it call the right pip (the system
> `pip3` in your case).
>
>
> On Fri, 22 Jan 2021 at 14:39, Bertrand B. <bertrand25mtl at gmail.com> wrote:
>
>> Thank you Guillaume for your help,
>>
>> I am using : (running on AWS EMR-6.2)
>> pip3 --version
>> pip 9.0.3 from /usr/lib/python3.7/site-packages (python 3.7)
>>
>>
>> pip3 install scikit-learn
>>
>> Collecting scikit-learn
>>   Using cached
>> https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz
>> Requirement already satisfied: numpy>=1.13.3 in
>> /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>> Collecting scipy>=0.19.1 (from scikit-learn)
>>   Using cached
>> https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
>> Requirement already satisfied: joblib>=0.11 in
>> /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>> Requirement already satisfied: threadpoolctl>=2.0.0 in
>> /usr/local/lib/python3.7/site-packages (from scikit-learn)
>> Installing collected packages: scipy, scikit-learn
>>   Running setup.py install for scikit-learn ... error
>>     Complete output from command /usr/bin/python3 -u -c "import
>> setuptools,
>> tokenize;__file__='/mnt/tmp/pip-build-93pagltp/scikit-learn/setup.py';f=getattr(tokenize,
>> 'open', open)(__file__);code=f.read().replace('\r\n',
>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record
>> /tmp/pip-0ulalx36-record/install-record.txt
>> --single-version-externally-managed --compile:
>>     Partial import of sklearn during the build process.
>>     Traceback (most recent call last):
>>       File
>> "/mnt/tmp/pip-build-93pagltp/scikit-learn/sklearn/_build_utils/__init__.py",
>> line 27, in _check_cython_version
>>         import Cython
>>     ModuleNotFoundError: No module named 'Cython'
>>
>>
>> Upgrading pip to 20.3.3 :
>>
>> sudo pip3 install --upgrade pip
>> sudo ln -s /usr/local/bin/pip3 /usr/bin/pip3
>>
>> pip3 --version
>> pip 20.3.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
>>
>> let me install from the whl file :
>> pip3 install scikit-learn
>> Collecting scikit-learn
>>   Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl
>> (22.3 MB)
>>
>> However, using the API sc.install_pypi_package("scikit-learn") still uses
>> the tar file instead of the whl file (even after the pip upgrade).
>>
>> Collecting scikit-learn
>>   Using cached https://files.pythonhosted.org/packages/f4/7b/d415b0c89babf23dcd8ee631015f043e2d76795edd9c7359d6e63257464b/scikit-learn-0.24.1.tar.gz
>>
>>
>> Thanks for your help,
>>
>> Cheers,
>>
>> Bertrand
>>
>> Le ven. 22 janv. 2021 ? 04:13, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
>> a ?crit :
>>
>>> @Bertrand Could you tell us which version of `pip` to you use (you need
>>> pip >= 19.0 for manylinux2010 and pip >= 19.3 for manylinux2014)
>>>
>>> On Fri, 22 Jan 2021 at 09:49, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
>>> wrote:
>>>
>>>> We might experience an issue with PyPI not selecting the manylinux2010
>>>> wheel: https://github.com/scikit-learn/scikit-learn/issues/19233
>>>> We have to check but we will probably shortly upload manylinux1 wheels
>>>> that should resolve the issue.
>>>>
>>>> I am curious if fetching the wheel by hand and installing via `pip`
>>>> would be a workaround (not practical for automated usage thought).
>>>>
>>>> On Thu, 21 Jan 2021 at 00:34, The Helmbolds via scikit-learn <
>>>> scikit-learn at python.org> wrote:
>>>>
>>>>> Use the Anaconda Python installation.
>>>>>
>>>>> "You won't find the right answers if you don't ask the right
>>>>> questions!" (Robert Helmbold, 2013)
>>>>>
>>>>>
>>>>> On Wednesday, January 20, 2021, 04:16:15 PM MST, Guillaume Lema?tre <
>>>>> g.lemaitre58 at gmail.com> wrote:
>>>>>
>>>>>
>>>>> Basically it get the tar with the source and recompile instead of
>>>>> using the wheel. Could you force an install from PyPI without using the
>>>>> cached file.
>>>>>
>>>>> We pushed wheels yesterday for 0.24.1 as well so it should not get the
>>>>> 0.24.0 version.
>>>>>
>>>>> For 0.23.2, you can see that it used the wheel (.whl).
>>>>>
>>>>> Sent from my phone - sorry to be brief and potential misspell.
>>>>> *From:* bertrand25mtl at gmail.com
>>>>> *Sent:* 20 January 2021 23:21
>>>>> *To:* scikit-learn at python.org
>>>>> *Reply to:* scikit-learn at python.org
>>>>> *Subject:* [scikit-learn] scikit-learn 0.24 installation fails with
>>>>> ModuleNotFoundError: No module named 'scipy'
>>>>>
>>>>> To whom it may concern,
>>>>>
>>>>> I am trying to install scikit-learn in a PySpark job using the
>>>>> install_pypi_package PySpark API but the install fails with :
>>>>>
>>>>> sc.install_pypi_package("scikit-learn")
>>>>>
>>>>> Collecting scikit-learn
>>>>>   Using cached https://files.pythonhosted.org/packages/db/e2/9c0bde5f81394b627f623557690536b12017b84988a4a1f98ec826edab9e/scikit-learn-0.24.0.tar.gz
>>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>>>> Collecting scipy>=0.19.1 (from scikit-learn)
>>>>>   Using cached https://files.pythonhosted.org/packages/58/9d/8296d8211318d690119eba6d293b7a149c1c51c945342dd4c3816f79e1ba/scipy-1.6.0-cp37-cp37m-manylinux1_x86_64.whl
>>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn)
>>>>> Collecting threadpoolctl>=2.0.0 (from scikit-learn)
>>>>>   Using cached https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
>>>>> Building wheels for  collected packages: scikit-learn
>>>>>   Running setup.py bdist_wheelfor  scikit-learn: started
>>>>>   Running setup.py bdist_wheelfor scikit-learn: finished with status 'error'
>>>>>   Complete output from command /tmp/1611000009300-0/bin/python -u -c "import setuptools, tokenize;__file__='/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ';f=getattr(tokenize, 'open', open)(__file__);code=f.read ().replace('\r\n', '\n');f.close ();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpry3gf9r0pip-wheel- --python-tag cp37:
>>>>>   Partial import of sklearn during the build process.
>>>>>   Traceback (most recent call last):
>>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 201, in check_package_status
>>>>>       module = importlib.import_module(package)
>>>>>     File "/tmp/1611000009300-0/lib64/python3.7/importlib/__init__.py", line 127, in import_module
>>>>>       return _bootstrap._gcd_import(name[level:], package, level)
>>>>>     File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
>>>>>     File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>>>>>     File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
>>>>>   ModuleNotFoundError: No module named 'scipy'
>>>>>   Traceback (most recent call last):
>>>>>     File "<string>", line 1, in <module>
>>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 306, in <module>
>>>>>       setup_package()
>>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 294, in setup_package
>>>>>       check_package_status('scipy', min_deps.SCIPY_MIN_VERSION)
>>>>>     File "/mnt/tmp/pip-build-phc6p6gl/scikit-learn/setup.py ", line 227, in check_package_status
>>>>>       .format(package, req_str, instructions))
>>>>>   ImportError: scipy is not installed.
>>>>>   scikit-learn requires scipy >= 0.19.1.
>>>>>
>>>>> I do not encounter this error with scikit-learn 0.23.2 :
>>>>>
>>>>> sc.install_pypi_package("scikit-learn==0.23.2")
>>>>>
>>>>> Collecting scikit-learn==0.23.2
>>>>>   Using cached https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl
>>>>> Requirement already satisfied: scipy>=0.19.1 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>>>> Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>>>> Requirement already satisfied: joblib>=0.11 in /usr/local/lib64/python3.7/site-packages (from scikit-learn==0.23.2)
>>>>> Requirement already satisfied: threadpoolctl>=2.0.0 in /mnt/tmp/1611000009300-0/lib/python3.7/site-packages (from scikit-learn==0.23.2)
>>>>> Installing collected packages: scikit-learn
>>>>> Successfully installed scikit-learn-0.23.2
>>>>>
>>>>>
>>>>> Could you please help me understand why the scikit-learn 0.24
>>>>> installation fails ?
>>>>>
>>>>> Thank you for your help,
>>>>>
>>>>> Bertrand
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Lemaitre
>>>> Scikit-learn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210123/65859932/attachment-0001.html>

From olivertomic at zoho.com  Sun Jan 24 06:52:57 2021
From: olivertomic at zoho.com (Oliver Tomic)
Date: Sun, 24 Jan 2021 12:52:57 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <CADa2P2WipVEoZhSMGf0bsBBkS-FBa9ZyB8O0EOSHp85N44TG7A@mail.gmail.com>
References: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
 <f82e57b1-a5dd-a055-f50d-fe6e1e573476@gmail.com>
 <CADa2P2WipVEoZhSMGf0bsBBkS-FBa9ZyB8O0EOSHp85N44TG7A@mail.gmail.com>
Message-ID: <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com>

Hi Mahmood,?


the information you need is given by the individual explained variance for each variable / feature. You get that information from the hoggorm package (Python):


https://github.com/olivertomic/hoggorm

https://hoggorm.readthedocs.io/en/latest/index.html?


Here is one of the PCA examples provided in a Jupyter notebook:

https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb


When you do PCA you get the information by calling for example:


cumCalExplVar_individualVariable?= model.X_cumCalExplVar() (which gives you the cumulative calibrated explained variance for each variable, cell 21 in the notebook)


cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives you the cumulative validated explained variance variable, cell 30 in the notebook)


The component where you get the biggest jump for the variable of interest is the component you are looking for.?


You could also have a look at the correlation loadings to identify the component you are looking for.?


cheers
Oliver


---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan <mahmood.nt at gmail.com> wrote ----


Hi 
Thanks for the replies. I read about the available functions in the 
PCA section. Consider the following code 
 
x = StandardScaler().fit_transform(x) 
pca = PCA() 
principalComponents = pca.fit_transform(x) 
principalDf = pd.DataFrame(data = principalComponents) 
loadings = pca.components_ 
finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1) 
print( "First and second observations\n", finalDf.loc[0:1] ) 
print( "loadings[0:1]\n", loadings[0], loadings[1] ) 
print ("explained_variance_ratio_\n",pca.explained_variance_ratio_) 
 
 
The output looks like 
 
First and second observations 
0 1 2 3 4 kernel 
0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1 
1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2 
loadings[0:1] 
[0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375 
-0.01257726 0.29718078 0.07493325 0.07562934] 
explained_variance_ratio_ 
[7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06] 
 
 
As you can see for two kernels named ELEC1 and ELEC2, there are five 
PCs from 0 to 4. 
Now based on the numbers in the loadings, I expect that loadings[0] 
which is the first variable is better shown on PC1-PC2 plane 
(0.49137412,0.46511098). However, loadings[1] which is the second 
variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078). 
Is this understanding correct? 
 
I don't understand what explained_variance_ratio_ is trying to say here. 
 
 
Regards, 
Mahmood 
 
On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <mailto:niourf at gmail.com> wrote: 
> 
> Hi Mahmood, 
> 
> There are different pieces of info that you can get from PCA: 
> 
> 1. How important is a given PC to reconstruct the entire dataset -> This 
> is given by explained_variance_ratio_ as Guillaume suggested 
> 
> 2. What is the contribution of each feature to each PC (remember that a 
> PC is a linear combination of all the features i.e.: PC_1 = X_1 . 
> alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what 
> you're looking for and they are given in the components_ matrix which is 
> a n_components x n_features matrix. 
> 
> Nicolas 
> 
> On 1/22/21 9:13 AM, Mahmood Naderan wrote: 
> > Hi 
> > I have a question about PCA and that is, how we can determine, a 
> > variable, X,  is better captured by which factor (principal 
> > component)? For example, maybe one variable has low weight in the 
> > first PC but has a higher weight in the fifth PC. 
> > 
> > When I use the PCA from Scikit, I have to manually work with the PCs, 
> > therefore, I may miss the point that although a variable is weak in 
> > PC1-PC2 plot, it may be strong in PC4-PC5 plot. 
> > 
> > Any comment on that? 
> > 
> > Regards, 
> > Mahmood 
> > _______________________________________________ 
> > scikit-learn mailing list 
> > mailto:scikit-learn at python.org 
> > https://mail.python.org/mailman/listinfo/scikit-learn 
> _______________________________________________ 
> scikit-learn mailing list 
> mailto:scikit-learn at python.org 
> https://mail.python.org/mailman/listinfo/scikit-learn 
_______________________________________________ 
scikit-learn mailing list 
mailto:scikit-learn at python.org 
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210124/ce24f0e3/attachment.html>

From mahmood.nt at gmail.com  Sun Jan 24 15:37:49 2021
From: mahmood.nt at gmail.com (Mahmood Naderan)
Date: Sun, 24 Jan 2021 21:37:49 +0100
Subject: [scikit-learn] Finding the PC that captures a specific variable
In-Reply-To: <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com>
References: <CADa2P2UVrOo53Xjf7mYwVbP7KPqQu8LgstGgPdMyHtM4ds5tKw@mail.gmail.com>
 <f82e57b1-a5dd-a055-f50d-fe6e1e573476@gmail.com>
 <CADa2P2WipVEoZhSMGf0bsBBkS-FBa9ZyB8O0EOSHp85N44TG7A@mail.gmail.com>
 <177343d6ef2.b0a17de91966.6692440022474630306@zoho.com>
Message-ID: <CADa2P2XwPYfV8MvmdRLM8Ky9vpP2y5mO21QN8jEze=RvRjoZpA@mail.gmail.com>

Hi Olivier,
Thanks for the suggestion. The package seems to be handy. I will try that.


Regards,
Mahmood


On Sun, Jan 24, 2021 at 12:55 PM Oliver Tomic via scikit-learn
<scikit-learn at python.org> wrote:
>
> Hi Mahmood,
>
> the information you need is given by the individual explained variance for each variable / feature. You get that information from the hoggorm package (Python):
>
> https://github.com/olivertomic/hoggorm
> https://hoggorm.readthedocs.io/en/latest/index.html
>
> Here is one of the PCA examples provided in a Jupyter notebook:
> https://github.com/olivertomic/hoggorm/blob/master/examples/PCA/PCA_on_cancer_data.ipynb
>
>
> When you do PCA you get the information by calling for example:
>
> cumCalExplVar_individualVariable = model.X_cumCalExplVar() (which gives you the cumulative calibrated explained variance for each variable, cell 21 in the notebook)
>
> cumValExplVar_individualVariable = model.X_cumValExplVar_indVar() (which gives you the cumulative validated explained variance variable, cell 30 in the notebook)
>
>
> The component where you get the biggest jump for the variable of interest is the component you are looking for.
>
> You could also have a look at the correlation loadings to identify the component you are looking for.
>
> cheers
> Oliver
>
>
>
>
>
>
> ---- On Fri, 22 Jan 2021 21:48:46 +0100 Mahmood Naderan <mahmood.nt at gmail.com> wrote ----
>
> Hi
> Thanks for the replies. I read about the available functions in the
> PCA section. Consider the following code
>
> x = StandardScaler().fit_transform(x)
> pca = PCA()
> principalComponents = pca.fit_transform(x)
> principalDf = pd.DataFrame(data = principalComponents)
> loadings = pca.components_
> finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1)
> print( "First and second observations\n", finalDf.loc[0:1] )
> print( "loadings[0:1]\n", loadings[0], loadings[1] )
> print ("explained_variance_ratio_\n",pca.explained_variance_ratio_)
>
>
> The output looks like
>
> First and second observations
> 0 1 2 3 4 kernel
> 0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1
> 1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2
> loadings[0:1]
> [0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375
> -0.01257726 0.29718078 0.07493325 0.07562934]
> explained_variance_ratio_
> [7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06]
>
>
>
> As you can see for two kernels named ELEC1 and ELEC2, there are five
> PCs from 0 to 4.
> Now based on the numbers in the loadings, I expect that loadings[0]
> which is the first variable is better shown on PC1-PC2 plane
> (0.49137412,0.46511098). However, loadings[1] which is the second
> variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078).
> Is this understanding correct?
>
> I don't understand what explained_variance_ratio_ is trying to say here.
>
>
> Regards,
> Mahmood
>
> On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <niourf at gmail.com> wrote:
> >
> > Hi Mahmood,
> >
> > There are different pieces of info that you can get from PCA:
> >
> > 1. How important is a given PC to reconstruct the entire dataset -> This
> > is given by explained_variance_ratio_ as Guillaume suggested
> >
> > 2. What is the contribution of each feature to each PC (remember that a
> > PC is a linear combination of all the features i.e.: PC_1 = X_1 .
> > alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what
> > you're looking for and they are given in the components_ matrix which is
> > a n_components x n_features matrix.
> >
> > Nicolas
> >
> > On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> > > Hi
> > > I have a question about PCA and that is, how we can determine, a
> > > variable, X, is better captured by which factor (principal
> > > component)? For example, maybe one variable has low weight in the
> > > first PC but has a higher weight in the fifth PC.
> > >
> > > When I use the PCA from Scikit, I have to manually work with the PCs,
> > > therefore, I may miss the point that although a variable is weak in
> > > PC1-PC2 plot, it may be strong in PC4-PC5 plot.
> > >
> > > Any comment on that?
> > >
> > > Regards,
> > > Mahmood
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

From rdslater at gmail.com  Sun Jan 31 14:43:32 2021
From: rdslater at gmail.com (Robert Slater)
Date: Sun, 31 Jan 2021 13:43:32 -0600
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
Message-ID: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>

I was writing an example for my students when I came across what I think is
an issue.  In version 24.1 using the LassoCV, the <object>.coef variable
should have a list of my coeficeients (at least according to my
understanding of the documents).  However, the variable is not populated
nad throws an error

'LassoCV' object has no attribute 'coef'


I do have a .coef_ variable which I believe is the coefficient for the best
fit only.

the  alphas and alphas_ variables have a similar issue in that alphas
returns nothing while alphas_ returns the list of alphas used.

I'm not sure if this is an documentation oversight or a real issue but
wanted to get clarification.

I can get what I need from o ther methods, but wanted to see if this needed
to be addressed.

Best Regards,

Robert Slater
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/4b2456d4/attachment.html>

From g.lemaitre58 at gmail.com  Sun Jan 31 15:00:34 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sun, 31 Jan 2021 21:00:34 +0100
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
In-Reply-To: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
References: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
Message-ID: <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>

Hi Robert,

> I do have a .coef_ variable which I believe is the coefficient for the
best fit only.

`coef` never existed. Fitted attributes always end with underscore.
We do not store coefficients for all fitted `alphas_`.
We provide some information regarding the MSE path for all tried alphas:
https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html

> the  alphas and alphas_ variables have a similar issue in that alphas
returns nothing while alphas_ returns the list of alphas used.

You probably created a model such as `model = LasssoCV()`. By default, the
parameter `alpha=None` thus accessing it will return None. After fitting,
`alphas_` will be automatically created as specified in the documentation.
It will correspond to the values tried by cross-validation.
If instead, you are passing an array to `alphas` then `alphas_` will be the
same as `alphas_` after calling `fit`.

Cheers,


On Sun, 31 Jan 2021 at 20:45, Robert Slater <rdslater at gmail.com> wrote:

> I was writing an example for my students when I came across what I think
> is an issue.  In version 24.1 using the LassoCV, the <object>.coef variable
> should have a list of my coeficeients (at least according to my
> understanding of the documents).  However, the variable is not populated
> nad throws an error
>
> 'LassoCV' object has no attribute 'coef'
>
>
> I do have a .coef_ variable which I believe is the coefficient for the
> best fit only.
>
> the  alphas and alphas_ variables have a similar issue in that alphas
> returns nothing while alphas_ returns the list of alphas used.
>
> I'm not sure if this is an documentation oversight or a real issue but
> wanted to get clarification.
>
> I can get what I need from o ther methods, but wanted to see if this
> needed to be addressed.
>
> Best Regards,
>
> Robert Slater
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/6db0c467/attachment.html>

From rdslater at gmail.com  Sun Jan 31 15:22:16 2021
From: rdslater at gmail.com (Robert Slater)
Date: Sun, 31 Jan 2021 14:22:16 -0600
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
In-Reply-To: <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>
References: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
 <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>
Message-ID: <CAMt686ZSwob7u8zBq8H55bLV3H6JPcU0ZmP8PBC3eivJQzdqag@mail.gmail.com>

Appreciate the clarification.  I definitely think the docs need some polish
as coef_ only returns a single fitting of coefficients and not the
coefficients along the path as stated in the api guide.

I am seeing

alpha_
alphas_
coef_
dual_gap_

as fitted variables (plus a few more) which is slightly different than the
guide/api docs (all the names are plural in the api guide)

I don't know if there is way to contribute an edit to the docs, I'd be more
than happy to do it (Sorry I'm very OCD about such things, and I know this
is a minor details)., I'd be happy to suggest the edit through proper
channels.

On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

> Hi Robert,
>
> > I do have a .coef_ variable which I believe is the coefficient for the
> best fit only.
>
> `coef` never existed. Fitted attributes always end with underscore.
> We do not store coefficients for all fitted `alphas_`.
> We provide some information regarding the MSE path for all tried alphas:
> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html
>
> > the  alphas and alphas_ variables have a similar issue in that alphas
> returns nothing while alphas_ returns the list of alphas used.
>
> You probably created a model such as `model = LasssoCV()`. By default, the
> parameter `alpha=None` thus accessing it will return None. After fitting,
> `alphas_` will be automatically created as specified in the documentation.
> It will correspond to the values tried by cross-validation.
> If instead, you are passing an array to `alphas` then `alphas_` will be
> the same as `alphas_` after calling `fit`.
>
> Cheers,
>
>
> On Sun, 31 Jan 2021 at 20:45, Robert Slater <rdslater at gmail.com> wrote:
>
>> I was writing an example for my students when I came across what I think
>> is an issue.  In version 24.1 using the LassoCV, the <object>.coef variable
>> should have a list of my coeficeients (at least according to my
>> understanding of the documents).  However, the variable is not populated
>> nad throws an error
>>
>> 'LassoCV' object has no attribute 'coef'
>>
>>
>> I do have a .coef_ variable which I believe is the coefficient for the
>> best fit only.
>>
>> the  alphas and alphas_ variables have a similar issue in that alphas
>> returns nothing while alphas_ returns the list of alphas used.
>>
>> I'm not sure if this is an documentation oversight or a real issue but
>> wanted to get clarification.
>>
>> I can get what I need from o ther methods, but wanted to see if this
>> needed to be addressed.
>>
>> Best Regards,
>>
>> Robert Slater
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/90d0edf2/attachment-0001.html>

From g.lemaitre58 at gmail.com  Sun Jan 31 15:37:19 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sun, 31 Jan 2021 21:37:19 +0100
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
In-Reply-To: <CAMt686ZSwob7u8zBq8H55bLV3H6JPcU0ZmP8PBC3eivJQzdqag@mail.gmail.com>
References: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
 <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>
 <CAMt686ZSwob7u8zBq8H55bLV3H6JPcU0ZmP8PBC3eivJQzdqag@mail.gmail.com>
Message-ID: <CACDxx9hhBSmb8OKq1NR6Z7a8RU_Szi3ChJdr=CAaSL9DGQuQfw@mail.gmail.com>

On Sun, 31 Jan 2021 at 21:24, Robert Slater <rdslater at gmail.com> wrote:

> Appreciate the clarification.  I definitely think the docs need some
> polish as coef_ only returns a single fitting of coefficients and not the
> coefficients along the path as stated in the api guide.
>

I am confused here. LassoCV states:

*coef : *ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).
So it seems exactly what it is returning. It does not return the
coefficients along the path.
Which documentation are you referring to when stating the API guide (if you
could provide a link, it would be really helpful)?


> I am seeing
>
> alpha_
> alphas_
> coef_
> dual_gap_
>
> as fitted variables (plus a few more) which is slightly different than the
> guide/api docs (all the names are plural in the api guide)
>
> I don't know if there is way to contribute an edit to the docs, I'd be
> more than happy to do it (Sorry I'm very OCD about such things, and I know
> this is a minor details)., I'd be happy to suggest the edit through proper
> channels.
>

You can always open a PR in the GitHub scikit-learn repository because the
documentation is actually the docstring from the classes and functions.
The user guide documentation is located in the /doc folder and the
contributing guide will be helpful to start with:
https://scikit-learn.org/stable/developers/contributing.html


>
> On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> wrote:
>
>> Hi Robert,
>>
>> > I do have a .coef_ variable which I believe is the coefficient for the
>> best fit only.
>>
>> `coef` never existed. Fitted attributes always end with underscore.
>> We do not store coefficients for all fitted `alphas_`.
>> We provide some information regarding the MSE path for all tried alphas:
>> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html
>>
>> > the  alphas and alphas_ variables have a similar issue in that alphas
>> returns nothing while alphas_ returns the list of alphas used.
>>
>> You probably created a model such as `model = LasssoCV()`. By default,
>> the parameter `alpha=None` thus accessing it will return None. After
>> fitting,
>> `alphas_` will be automatically created as specified in the
>> documentation. It will correspond to the values tried by cross-validation.
>> If instead, you are passing an array to `alphas` then `alphas_` will be
>> the same as `alphas_` after calling `fit`.
>>
>> Cheers,
>>
>>
>> On Sun, 31 Jan 2021 at 20:45, Robert Slater <rdslater at gmail.com> wrote:
>>
>>> I was writing an example for my students when I came across what I think
>>> is an issue.  In version 24.1 using the LassoCV, the <object>.coef variable
>>> should have a list of my coeficeients (at least according to my
>>> understanding of the documents).  However, the variable is not populated
>>> nad throws an error
>>>
>>> 'LassoCV' object has no attribute 'coef'
>>>
>>>
>>> I do have a .coef_ variable which I believe is the coefficient for the
>>> best fit only.
>>>
>>> the  alphas and alphas_ variables have a similar issue in that alphas
>>> returns nothing while alphas_ returns the list of alphas used.
>>>
>>> I'm not sure if this is an documentation oversight or a real issue but
>>> wanted to get clarification.
>>>
>>> I can get what I need from o ther methods, but wanted to see if this
>>> needed to be addressed.
>>>
>>> Best Regards,
>>>
>>> Robert Slater
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/cca21342/attachment.html>

From g.lemaitre58 at gmail.com  Sun Jan 31 15:38:20 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Sun, 31 Jan 2021 21:38:20 +0100
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
In-Reply-To: <CACDxx9hhBSmb8OKq1NR6Z7a8RU_Szi3ChJdr=CAaSL9DGQuQfw@mail.gmail.com>
References: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
 <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>
 <CAMt686ZSwob7u8zBq8H55bLV3H6JPcU0ZmP8PBC3eivJQzdqag@mail.gmail.com>
 <CACDxx9hhBSmb8OKq1NR6Z7a8RU_Szi3ChJdr=CAaSL9DGQuQfw@mail.gmail.com>
Message-ID: <CACDxx9ij3kfuNe-s_Q9sxQzS5-wUCDC_NPEjR8sJoTPY4PBsew@mail.gmail.com>

On Sun, 31 Jan 2021 at 21:37, Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

>
>
> On Sun, 31 Jan 2021 at 21:24, Robert Slater <rdslater at gmail.com> wrote:
>
>> Appreciate the clarification.  I definitely think the docs need some
>> polish as coef_ only returns a single fitting of coefficients and not the
>> coefficients along the path as stated in the api guide.
>>
>
> I am confused here. LassoCV states:
>
> *coef : *ndarray of shape (n_features,) or (n_targets, n_features)
>
Ups, `coef_` indeed (I messed up the copy-paste)


> Parameter vector (w in the cost function formula).
> So it seems exactly what it is returning. It does not return the
> coefficients along the path.
> Which documentation are you referring to when stating the API guide (if
> you could provide a link, it would be really helpful)?
>
>
>> I am seeing
>>
>> alpha_
>> alphas_
>> coef_
>> dual_gap_
>>
>> as fitted variables (plus a few more) which is slightly different than
>> the guide/api docs (all the names are plural in the api guide)
>>
>> I don't know if there is way to contribute an edit to the docs, I'd be
>> more than happy to do it (Sorry I'm very OCD about such things, and I know
>> this is a minor details)., I'd be happy to suggest the edit through proper
>> channels.
>>
>
> You can always open a PR in the GitHub scikit-learn repository because the
> documentation is actually the docstring from the classes and functions.
> The user guide documentation is located in the /doc folder and the
> contributing guide will be helpful to start with:
> https://scikit-learn.org/stable/developers/contributing.html
>
>
>>
>> On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre <
>> g.lemaitre58 at gmail.com> wrote:
>>
>>> Hi Robert,
>>>
>>> > I do have a .coef_ variable which I believe is the coefficient for the
>>> best fit only.
>>>
>>> `coef` never existed. Fitted attributes always end with underscore.
>>> We do not store coefficients for all fitted `alphas_`.
>>> We provide some information regarding the MSE path for all tried alphas:
>>> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html
>>>
>>> > the  alphas and alphas_ variables have a similar issue in that alphas
>>> returns nothing while alphas_ returns the list of alphas used.
>>>
>>> You probably created a model such as `model = LasssoCV()`. By default,
>>> the parameter `alpha=None` thus accessing it will return None. After
>>> fitting,
>>> `alphas_` will be automatically created as specified in the
>>> documentation. It will correspond to the values tried by cross-validation.
>>> If instead, you are passing an array to `alphas` then `alphas_` will be
>>> the same as `alphas_` after calling `fit`.
>>>
>>> Cheers,
>>>
>>>
>>> On Sun, 31 Jan 2021 at 20:45, Robert Slater <rdslater at gmail.com> wrote:
>>>
>>>> I was writing an example for my students when I came across what I
>>>> think is an issue.  In version 24.1 using the LassoCV, the <object>.coef
>>>> variable should have a list of my coeficeients (at least according to my
>>>> understanding of the documents).  However, the variable is not populated
>>>> nad throws an error
>>>>
>>>> 'LassoCV' object has no attribute 'coef'
>>>>
>>>>
>>>> I do have a .coef_ variable which I believe is the coefficient for the
>>>> best fit only.
>>>>
>>>> the  alphas and alphas_ variables have a similar issue in that alphas
>>>> returns nothing while alphas_ returns the list of alphas used.
>>>>
>>>> I'm not sure if this is an documentation oversight or a real issue but
>>>> wanted to get clarification.
>>>>
>>>> I can get what I need from o ther methods, but wanted to see if this
>>>> needed to be addressed.
>>>>
>>>> Best Regards,
>>>>
>>>> Robert Slater
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/6f9af041/attachment-0001.html>

From rdslater at gmail.com  Sun Jan 31 15:45:58 2021
From: rdslater at gmail.com (Robert Slater)
Date: Sun, 31 Jan 2021 14:45:58 -0600
Subject: [scikit-learn] LassoCV.coef not implemented (I think)
In-Reply-To: <CACDxx9hhBSmb8OKq1NR6Z7a8RU_Szi3ChJdr=CAaSL9DGQuQfw@mail.gmail.com>
References: <CAMt686bySZtkVWKWyZw4QznKYxUGbAi+Cwt41MOg6uEfeEH4uA@mail.gmail.com>
 <CACDxx9gA1aRQ3xgxQ0vr4tutPcHf2ibvsSWsKYpLWr4okYROaQ@mail.gmail.com>
 <CAMt686ZSwob7u8zBq8H55bLV3H6JPcU0ZmP8PBC3eivJQzdqag@mail.gmail.com>
 <CACDxx9hhBSmb8OKq1NR6Z7a8RU_Szi3ChJdr=CAaSL9DGQuQfw@mail.gmail.com>
Message-ID: <CAMt686ZsFiGWRgiAzATmBy9+VhmBzTvKRgJX6GY00knx8ppiLA@mail.gmail.com>

Ok its on me--I was reading the return objects for path method.

My apologies.

On Sun, Jan 31, 2021 at 2:38 PM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

>
>
> On Sun, 31 Jan 2021 at 21:24, Robert Slater <rdslater at gmail.com> wrote:
>
>> Appreciate the clarification.  I definitely think the docs need some
>> polish as coef_ only returns a single fitting of coefficients and not the
>> coefficients along the path as stated in the api guide.
>>
>
> I am confused here. LassoCV states:
>
> *coef : *ndarray of shape (n_features,) or (n_targets, n_features)
>
> Parameter vector (w in the cost function formula).
> So it seems exactly what it is returning. It does not return the
> coefficients along the path.
> Which documentation are you referring to when stating the API guide (if
> you could provide a link, it would be really helpful)?
>
>
>> I am seeing
>>
>> alpha_
>> alphas_
>> coef_
>> dual_gap_
>>
>> as fitted variables (plus a few more) which is slightly different than
>> the guide/api docs (all the names are plural in the api guide)
>>
>> I don't know if there is way to contribute an edit to the docs, I'd be
>> more than happy to do it (Sorry I'm very OCD about such things, and I know
>> this is a minor details)., I'd be happy to suggest the edit through proper
>> channels.
>>
>
> You can always open a PR in the GitHub scikit-learn repository because the
> documentation is actually the docstring from the classes and functions.
> The user guide documentation is located in the /doc folder and the
> contributing guide will be helpful to start with:
> https://scikit-learn.org/stable/developers/contributing.html
>
>
>>
>> On Sun, Jan 31, 2021 at 2:02 PM Guillaume Lema?tre <
>> g.lemaitre58 at gmail.com> wrote:
>>
>>> Hi Robert,
>>>
>>> > I do have a .coef_ variable which I believe is the coefficient for the
>>> best fit only.
>>>
>>> `coef` never existed. Fitted attributes always end with underscore.
>>> We do not store coefficients for all fitted `alphas_`.
>>> We provide some information regarding the MSE path for all tried alphas:
>>> https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html
>>>
>>> > the  alphas and alphas_ variables have a similar issue in that alphas
>>> returns nothing while alphas_ returns the list of alphas used.
>>>
>>> You probably created a model such as `model = LasssoCV()`. By default,
>>> the parameter `alpha=None` thus accessing it will return None. After
>>> fitting,
>>> `alphas_` will be automatically created as specified in the
>>> documentation. It will correspond to the values tried by cross-validation.
>>> If instead, you are passing an array to `alphas` then `alphas_` will be
>>> the same as `alphas_` after calling `fit`.
>>>
>>> Cheers,
>>>
>>>
>>> On Sun, 31 Jan 2021 at 20:45, Robert Slater <rdslater at gmail.com> wrote:
>>>
>>>> I was writing an example for my students when I came across what I
>>>> think is an issue.  In version 24.1 using the LassoCV, the <object>.coef
>>>> variable should have a list of my coeficeients (at least according to my
>>>> understanding of the documents).  However, the variable is not populated
>>>> nad throws an error
>>>>
>>>> 'LassoCV' object has no attribute 'coef'
>>>>
>>>>
>>>> I do have a .coef_ variable which I believe is the coefficient for the
>>>> best fit only.
>>>>
>>>> the  alphas and alphas_ variables have a similar issue in that alphas
>>>> returns nothing while alphas_ returns the list of alphas used.
>>>>
>>>> I'm not sure if this is an documentation oversight or a real issue but
>>>> wanted to get clarification.
>>>>
>>>> I can get what I need from o ther methods, but wanted to see if this
>>>> needed to be addressed.
>>>>
>>>> Best Regards,
>>>>
>>>> Robert Slater
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210131/4c80ba82/attachment.html>