<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
The definition of PCA has a centering step, but no scaling step.<br>
<br>
<div class="moz-cite-prefix">On 10/16/2017 11:16 AM, Ismael Lemhadri
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CANpSPFQs-zfUeggZeMsN6NdLGv512MDCW2ZqB8cgK1hFFdfHfw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Dear Roman,</div>
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">My concern is
actually not about not mentioning the scaling but about not
mentioning the centering.</div>
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">That is, the
sklearn PCA removes the mean but it does not mention it in the
help file.</div>
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">This was quite
messy for me to debug as I expected it to either: 1/ center
and scale simultaneously or / not scale and not center either.</div>
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">It would be
beneficial to explicit the behavior in the help file in my
opinion.</div>
<div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Ismael</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Oct 16, 2017 at 8:02 AM, <span
dir="ltr"><<a
href="mailto:scikit-learn-request@python.org"
target="_blank" moz-do-not-send="true">scikit-learn-request@python.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Send
scikit-learn mailing list submissions to<br>
<a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
or, via email, send a message with subject or body 'help'
to<br>
<a href="mailto:scikit-learn-request@python.org"
moz-do-not-send="true">scikit-learn-request@python.<wbr>org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:scikit-learn-owner@python.org"
moz-do-not-send="true">scikit-learn-owner@python.org</a><br>
<br>
When replying, please edit your Subject line so it is more
specific<br>
than "Re: Contents of scikit-learn digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. unclear help file for sklearn.decomposition.pca
(Ismael Lemhadri)<br>
2. Re: unclear help file for sklearn.decomposition.pca<br>
(Roman Yurchak)<br>
3. Question about LDA's coef_ attribute (Serafeim
Loukas)<br>
4. Re: Question about LDA's coef_ attribute (Alexandre
Gramfort)<br>
5. Re: Question about LDA's coef_ attribute (Serafeim
Loukas)<br>
<br>
<br>
------------------------------<wbr>------------------------------<wbr>----------<br>
<br>
Message: 1<br>
Date: Sun, 15 Oct 2017 18:42:56 -0700<br>
From: Ismael Lemhadri <<a
href="mailto:lemhadri@stanford.edu"
moz-do-not-send="true">lemhadri@stanford.edu</a>><br>
To: <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
Subject: [scikit-learn] unclear help file for<br>
sklearn.decomposition.pca<br>
Message-ID:<br>
<CANpSPFTgv+<wbr>Oz7f97dandmrBBayqf_o9w=<a
href="mailto:18oKHCFN0u5DNzj%2Bg@mail.gmail.com"
moz-do-not-send="true">18oKHCF<wbr>N0u5DNzj+g@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Dear all,<br>
The help file for the PCA class is unclear about the
preprocessing<br>
performed to the data.<br>
You can check on line 410 here:<br>
<a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<br>
decomposition/pca.py#L410</a><br>
that the matrix is centered but NOT scaled, before
performing the singular<br>
value decomposition.<br>
However, the help files do not make any mention of it.<br>
This is unclear for someone who, like me, just wanted to
compare that the<br>
PCA and np.linalg.svd give the same results. In academic
settings, students<br>
are often asked to compare different methods and to check
that they yield<br>
the same results. I expect that many students have
confronted this problem<br>
before...<br>
Best,<br>
Ismael Lemhadri<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171015/c465bde7/<wbr>attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 16 Oct 2017 15:16:45 +0200<br>
From: Roman Yurchak <<a
href="mailto:rth.yurchak@gmail.com"
moz-do-not-send="true">rth.yurchak@gmail.com</a>><br>
To: Scikit-learn mailing list <<a
href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] unclear help file for<br>
sklearn.decomposition.pca<br>
Message-ID: <<a
href="mailto:b2abdcfd-4736-929e-6304-b93832932043@gmail.com"
moz-do-not-send="true">b2abdcfd-4736-929e-6304-<wbr>b93832932043@gmail.com</a>><br>
Content-Type: text/plain; charset=utf-8; format=flowed<br>
<br>
Ismael,<br>
<br>
as far as I saw the sklearn.decomposition.PCA doesn't
mention scaling at<br>
all (except for the whiten parameter which is
post-transformation scaling).<br>
<br>
So since it doesn't mention it, it makes sense that it
doesn't do any<br>
scaling of the input. Same as np.linalg.svd.<br>
<br>
You can verify that PCA and np.linalg.svd yield the same
results, with<br>
<br>
```<br>
>>> import numpy as np<br>
>>> from sklearn.decomposition import PCA<br>
>>> import numpy.linalg<br>
>>> X = np.random.RandomState(42).<wbr>rand(10,
4)<br>
>>> n_components = 2<br>
>>> PCA(n_components, svd_solver='full').fit_<wbr>transform(X)<br>
```<br>
<br>
and<br>
<br>
```<br>
>>> U, s, V = np.linalg.svd(X - X.mean(axis=0),
full_matrices=False)<br>
>>> (X - X.mean(axis=0)).dot(V[:n_<wbr>components].T)<br>
```<br>
<br>
--<br>
Roman<br>
<br>
On 16/10/17 03:42, Ismael Lemhadri wrote:<br>
> Dear all,<br>
> The help file for the PCA class is unclear about the
preprocessing<br>
> performed to the data.<br>
> You can check on line 410 here:<br>
> <a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a><br>
> <<a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a>><br>
> that the matrix is centered but NOT scaled, before
performing the<br>
> singular value decomposition.<br>
> However, the help files do not make any mention of
it.<br>
> This is unclear for someone who, like me, just wanted
to compare that<br>
> the PCA and np.linalg.svd give the same results. In
academic settings,<br>
> students are often asked to compare different methods
and to check that<br>
> they yield the same results. I expect that many
students have confronted<br>
> this problem before...<br>
> Best,<br>
> Ismael Lemhadri<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
> <a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Mon, 16 Oct 2017 15:27:48 +0200<br>
From: Serafeim Loukas <<a
href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>
To: <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
Subject: [scikit-learn] Question about LDA's coef_
attribute<br>
Message-ID: <<a
href="mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9@gmail.com"
moz-do-not-send="true">58C6D0DA-9DE5-4EF5-97C1-<wbr>48159831F5A9@gmail.com</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
Dear Scikit-learn community,<br>
<br>
Since the documentation of the LDA (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>
<<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>>)
is not so clear, I would like to ask if the lda.coef_
attribute stores the eigenvectors from the SVD
decomposition.<br>
<br>
Thank you in advance,<br>
Serafeim<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/4263df5c/<wbr>attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Mon, 16 Oct 2017 16:57:52 +0200<br>
From: Alexandre Gramfort <<a
href="mailto:alexandre.gramfort@inria.fr"
moz-do-not-send="true">alexandre.gramfort@inria.fr</a>><br>
To: Scikit-learn mailing list <<a
href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] Question about LDA's coef_
attribute<br>
Message-ID:<br>
<<a
href="mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@mail.gmail.com"
moz-do-not-send="true">CADeotZricOQhuHJMmW2Z14cqffEQ<wbr>yndYoxn-OgKAvTMQ7V0Y2g@mail.<wbr>gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
no it stores the direction of the decision function to
match the API of<br>
linear models.<br>
<br>
HTH<br>
Alex<br>
<br>
On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a
href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>
wrote:<br>
> Dear Scikit-learn community,<br>
><br>
> Since the documentation of the LDA<br>
> (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
> is not so clear, I would like to ask if the lda.coef_
attribute stores the<br>
> eigenvectors from the SVD decomposition.<br>
><br>
> Thank you in advance,<br>
> Serafeim<br>
><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
> <a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
><br>
<br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Mon, 16 Oct 2017 17:02:46 +0200<br>
From: Serafeim Loukas <<a
href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>
To: Scikit-learn mailing list <<a
href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a>><br>
Subject: Re: [scikit-learn] Question about LDA's coef_
attribute<br>
Message-ID: <<a
href="mailto:413210D2-56AE-41A4-873F-D171BB36539D@gmail.com"
moz-do-not-send="true">413210D2-56AE-41A4-873F-<wbr>D171BB36539D@gmail.com</a>><br>
Content-Type: text/plain; charset="us-ascii"<br>
<br>
Dear Alex,<br>
<br>
Thank you for the prompt response.<br>
<br>
Are the eigenvectors stored in some variable ?<br>
Does the lda.scalings_ attribute contain the eigenvectors
?<br>
<br>
Best,<br>
Serafeim<br>
<br>
> On 16 Oct 2017, at 16:57, Alexandre Gramfort <<a
href="mailto:alexandre.gramfort@inria.fr"
moz-do-not-send="true">alexandre.gramfort@inria.fr</a>>
wrote:<br>
><br>
> no it stores the direction of the decision function
to match the API of<br>
> linear models.<br>
><br>
> HTH<br>
> Alex<br>
><br>
> On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a
href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>
wrote:<br>
>> Dear Scikit-learn community,<br>
>><br>
>> Since the documentation of the LDA<br>
>> (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
>> is not so clear, I would like to ask if the
lda.coef_ attribute stores the<br>
>> eigenvectors from the SVD decomposition.<br>
>><br>
>> Thank you in advance,<br>
>> Serafeim<br>
>><br>
>> ______________________________<wbr>_________________<br>
>> scikit-learn mailing list<br>
>> <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
>> <a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
>><br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
> <a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/505c7da3/<wbr>attachment.html</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org"
moz-do-not-send="true">scikit-learn@python.org</a><br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
<br>
------------------------------<br>
<br>
End of scikit-learn Digest, Vol 19, Issue 25<br>
******************************<wbr>**************<br>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</body>
</html>