<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Dear Roman,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">My concern is actually not about not mentioning the scaling but about not mentioning the centering.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">That is, the sklearn PCA removes the mean but it does not mention it in the help file.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">This was quite messy for me to debug as I expected it to either: 1/ center and scale simultaneously or / not scale and not center either.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">It would be beneficial to explicit the behavior in the help file in my opinion.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Ismael</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 16, 2017 at 8:02 AM, <span dir="ltr"><<a href="mailto:scikit-learn-request@python.org" target="_blank">scikit-learn-request@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send scikit-learn mailing list submissions to<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:scikit-learn-request@python.org">scikit-learn-request@python.<wbr>org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:scikit-learn-owner@python.org">scikit-learn-owner@python.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of scikit-learn digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. unclear help file for sklearn.decomposition.pca (Ismael Lemhadri)<br>
2. Re: unclear help file for sklearn.decomposition.pca<br>
(Roman Yurchak)<br>
3. Question about LDA's coef_ attribute (Serafeim Loukas)<br>
4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)<br>
5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)<br>
<br>
<br>
------------------------------<wbr>------------------------------<wbr>----------<br>
<br>
Message: 1<br>
Date: Sun, 15 Oct 2017 18:42:56 -0700<br>
From: Ismael Lemhadri <<a href="mailto:lemhadri@stanford.edu">lemhadri@stanford.edu</a>><br>
To: <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
Subject: [scikit-learn] unclear help file for<br>
sklearn.decomposition.pca<br>
Message-ID:<br>
<CANpSPFTgv+<wbr>Oz7f97dandmrBBayqf_o9w=<a href="mailto:18oKHCFN0u5DNzj%2Bg@mail.gmail.com">18oKHCF<wbr>N0u5DNzj+g@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Dear all,<br>
The help file for the PCA class is unclear about the preprocessing<br>
performed to the data.<br>
You can check on line 410 here:<br>
<a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/
decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<br>
decomposition/pca.py#L410</a><br>
that the matrix is centered but NOT scaled, before performing the singular<br>
value decomposition.<br>
However, the help files do not make any mention of it.<br>
This is unclear for someone who, like me, just wanted to compare that the<br>
PCA and np.linalg.svd give the same results. In academic settings, students<br>
are often asked to compare different methods and to check that they yield<br>
the same results. I expect that many students have confronted this problem<br>
before...<br>
Best,<br>
Ismael Lemhadri<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171015/c465bde7/<wbr>attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 16 Oct 2017 15:16:45 +0200<br>
From: Roman Yurchak <<a href="mailto:rth.yurchak@gmail.com">rth.yurchak@gmail.com</a>><br>
To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] unclear help file for<br>
sklearn.decomposition.pca<br>
Message-ID: <<a href="mailto:b2abdcfd-4736-929e-6304-b93832932043@gmail.com">b2abdcfd-4736-929e-6304-<wbr>b93832932043@gmail.com</a>><br>
Content-Type: text/plain; charset=utf-8; format=flowed<br>
<br>
Ismael,<br>
<br>
as far as I saw the sklearn.decomposition.PCA doesn't mention scaling at<br>
all (except for the whiten parameter which is post-transformation scaling).<br>
<br>
So since it doesn't mention it, it makes sense that it doesn't do any<br>
scaling of the input. Same as np.linalg.svd.<br>
<br>
You can verify that PCA and np.linalg.svd yield the same results, with<br>
<br>
```<br>
>>> import numpy as np<br>
>>> from sklearn.decomposition import PCA<br>
>>> import numpy.linalg<br>
>>> X = np.random.RandomState(42).<wbr>rand(10, 4)<br>
>>> n_components = 2<br>
>>> PCA(n_components, svd_solver='full').fit_<wbr>transform(X)<br>
```<br>
<br>
and<br>
<br>
```<br>
>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)<br>
>>> (X - X.mean(axis=0)).dot(V[:n_<wbr>components].T)<br>
```<br>
<br>
--<br>
Roman<br>
<br>
On 16/10/17 03:42, Ismael Lemhadri wrote:<br>
> Dear all,<br>
> The help file for the PCA class is unclear about the preprocessing<br>
> performed to the data.<br>
> You can check on line 410 here:<br>
> <a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a><br>
> <<a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a>><br>
> that the matrix is centered but NOT scaled, before performing the<br>
> singular value decomposition.<br>
> However, the help files do not make any mention of it.<br>
> This is unclear for someone who, like me, just wanted to compare that<br>
> the PCA and np.linalg.svd give the same results. In academic settings,<br>
> students are often asked to compare different methods and to check that<br>
> they yield the same results. I expect that many students have confronted<br>
> this problem before...<br>
> Best,<br>
> Ismael Lemhadri<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Mon, 16 Oct 2017 15:27:48 +0200<br>
From: Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>><br>
To: <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
Subject: [scikit-learn] Question about LDA's coef_ attribute<br>
Message-ID: <<a href="mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9@gmail.com">58C6D0DA-9DE5-4EF5-97C1-<wbr>48159831F5A9@gmail.com</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
Dear Scikit-learn community,<br>
<br>
Since the documentation of the LDA (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a> <<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>>) is not so clear, I would like to ask if the lda.coef_ attribute stores the eigenvectors from the SVD decomposition.<br>
<br>
Thank you in advance,<br>
Serafeim<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/4263df5c/<wbr>attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Mon, 16 Oct 2017 16:57:52 +0200<br>
From: Alexandre Gramfort <<a href="mailto:alexandre.gramfort@inria.fr">alexandre.gramfort@inria.fr</a>><br>
To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] Question about LDA's coef_ attribute<br>
Message-ID:<br>
<<a href="mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@mail.gmail.com">CADeotZricOQhuHJMmW2Z14cqffEQ<wbr>yndYoxn-OgKAvTMQ7V0Y2g@mail.<wbr>gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
no it stores the direction of the decision function to match the API of<br>
linear models.<br>
<br>
HTH<br>
Alex<br>
<br>
On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>> wrote:<br>
> Dear Scikit-learn community,<br>
><br>
> Since the documentation of the LDA<br>
> (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
> is not so clear, I would like to ask if the lda.coef_ attribute stores the<br>
> eigenvectors from the SVD decomposition.<br>
><br>
> Thank you in advance,<br>
> Serafeim<br>
><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
><br>
<br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Mon, 16 Oct 2017 17:02:46 +0200<br>
From: Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>><br>
To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] Question about LDA's coef_ attribute<br>
Message-ID: <<a href="mailto:413210D2-56AE-41A4-873F-D171BB36539D@gmail.com">413210D2-56AE-41A4-873F-<wbr>D171BB36539D@gmail.com</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
Dear Alex,<br>
<br>
Thank you for the prompt response.<br>
<br>
Are the eigenvectors stored in some variable ?<br>
Does the lda.scalings_ attribute contain the eigenvectors ?<br>
<br>
Best,<br>
Serafeim<br>
<br>
> On 16 Oct 2017, at 16:57, Alexandre Gramfort <<a href="mailto:alexandre.gramfort@inria.fr">alexandre.gramfort@inria.fr</a>> wrote:<br>
><br>
> no it stores the direction of the decision function to match the API of<br>
> linear models.<br>
><br>
> HTH<br>
> Alex<br>
><br>
> On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>> wrote:<br>
>> Dear Scikit-learn community,<br>
>><br>
>> Since the documentation of the LDA<br>
>> (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
>> is not so clear, I would like to ask if the lda.coef_ attribute stores the<br>
>> eigenvectors from the SVD decomposition.<br>
>><br>
>> Thank you in advance,<br>
>> Serafeim<br>
>><br>
>> ______________________________<wbr>_________________<br>
>> scikit-learn mailing list<br>
>> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
>> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
>><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/505c7da3/<wbr>attachment.html</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
<br>
------------------------------<br>
<br>
End of scikit-learn Digest, Vol 19, Issue 25<br>
******************************<wbr>**************<br>
</blockquote></div><br></div></div>