<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Dear Roman,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">My concern is actually not about not mentioning the scaling but about not mentioning the centering.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">That is, the sklearn PCA removes the mean but it does not mention it in the help file.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">This was quite messy for me to debug as I expected it to either: 1/ center and scale simultaneously or / not scale and not center either.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">It would be beneficial to explicit the behavior in the help file in my opinion.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Ismael</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 16, 2017 at 8:02 AM,  <span dir="ltr"><<a href="mailto:scikit-learn-request@python.org" target="_blank">scikit-learn-request@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send scikit-learn mailing list submissions to<br>

        <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:scikit-learn-request@python.org">scikit-learn-request@python.<wbr>org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:scikit-learn-owner@python.org">scikit-learn-owner@python.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of scikit-learn digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. unclear help file for sklearn.decomposition.pca (Ismael Lemhadri)<br>

   2. Re: unclear help file for sklearn.decomposition.pca<br>

      (Roman Yurchak)<br>

   3. Question about LDA's coef_ attribute (Serafeim Loukas)<br>

   4. Re: Question about LDA's coef_ attribute (Alexandre Gramfort)<br>

   5. Re: Question about LDA's coef_ attribute (Serafeim Loukas)<br>

<br>

<br>

------------------------------<wbr>------------------------------<wbr>----------<br>

<br>

Message: 1<br>

Date: Sun, 15 Oct 2017 18:42:56 -0700<br>

From: Ismael Lemhadri <<a href="mailto:lemhadri@stanford.edu">lemhadri@stanford.edu</a>><br>

To: <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

Subject: [scikit-learn] unclear help file for<br>

        sklearn.decomposition.pca<br>

Message-ID:<br>

        <CANpSPFTgv+<wbr>Oz7f97dandmrBBayqf_o9w=<a href="mailto:18oKHCFN0u5DNzj%2Bg@mail.gmail.com">18oKHCF<wbr>N0u5DNzj+g@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

Dear all,<br>

The help file for the PCA class is unclear about the preprocessing<br>

performed to the data.<br>

You can check on line 410 here:<br>

<a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/

decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<br>

decomposition/pca.py#L410</a><br>

that the matrix is centered but NOT scaled, before performing the singular<br>

value decomposition.<br>

However, the help files do not make any mention of it.<br>

This is unclear for someone who, like me, just wanted to compare that the<br>

PCA and np.linalg.svd give the same results. In academic settings, students<br>

are often asked to compare different methods and to check that they yield<br>

the same results. I expect that many students have confronted this problem<br>

before...<br>

Best,<br>

Ismael Lemhadri<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171015/c465bde7/<wbr>attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Mon, 16 Oct 2017 15:16:45 +0200<br>

From: Roman Yurchak <<a href="mailto:rth.yurchak@gmail.com">rth.yurchak@gmail.com</a>><br>

To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>

Subject: Re: [scikit-learn] unclear help file for<br>

        sklearn.decomposition.pca<br>

Message-ID: <<a href="mailto:b2abdcfd-4736-929e-6304-b93832932043@gmail.com">b2abdcfd-4736-929e-6304-<wbr>b93832932043@gmail.com</a>><br>

Content-Type: text/plain; charset=utf-8; format=flowed<br>

<br>

Ismael,<br>

<br>

as far as I saw the sklearn.decomposition.PCA doesn't mention scaling at<br>

all (except for the whiten parameter which is post-transformation scaling).<br>

<br>

So since it doesn't mention it, it makes sense that it doesn't do any<br>

scaling of the input. Same as np.linalg.svd.<br>

<br>

You can verify that PCA and np.linalg.svd yield the same results, with<br>

<br>

```<br>

 >>> import numpy as np<br>

 >>> from sklearn.decomposition import PCA<br>

 >>> import numpy.linalg<br>

 >>> X = np.random.RandomState(42).<wbr>rand(10, 4)<br>

 >>> n_components = 2<br>

 >>> PCA(n_components, svd_solver='full').fit_<wbr>transform(X)<br>

```<br>

<br>

and<br>

<br>

```<br>

 >>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)<br>

 >>> (X - X.mean(axis=0)).dot(V[:n_<wbr>components].T)<br>

```<br>

<br>

--<br>

Roman<br>

<br>

On 16/10/17 03:42, Ismael Lemhadri wrote:<br>

> Dear all,<br>

> The help file for the PCA class is unclear about the preprocessing<br>

> performed to the data.<br>

> You can check on line 410 here:<br>

> <a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a><br>

> <<a href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a>><br>

> that the matrix is centered but NOT scaled, before performing the<br>

> singular value decomposition.<br>

> However, the help files do not make any mention of it.<br>

> This is unclear for someone who, like me, just wanted to compare that<br>

> the PCA and np.linalg.svd give the same results. In academic settings,<br>

> students are often asked to compare different methods and to check that<br>

> they yield the same results. I expect that many students have confronted<br>

> this problem before...<br>

> Best,<br>

> Ismael Lemhadri<br>

><br>

><br>

> ______________________________<wbr>_________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Mon, 16 Oct 2017 15:27:48 +0200<br>

From: Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>><br>

To: <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

Subject: [scikit-learn] Question about LDA's coef_ attribute<br>

Message-ID: <<a href="mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9@gmail.com">58C6D0DA-9DE5-4EF5-97C1-<wbr>48159831F5A9@gmail.com</a>><br>

Content-Type: text/plain; charset="us-ascii"<br>

<br>

Dear Scikit-learn community,<br>

<br>

Since the documentation of the LDA (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a> <<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>>) is not so clear, I would like to ask if the lda.coef_ attribute stores the eigenvectors from the SVD decomposition.<br>

<br>

Thank you in advance,<br>

Serafeim<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/4263df5c/<wbr>attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Mon, 16 Oct 2017 16:57:52 +0200<br>

From: Alexandre Gramfort <<a href="mailto:alexandre.gramfort@inria.fr">alexandre.gramfort@inria.fr</a>><br>

To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>

Subject: Re: [scikit-learn] Question about LDA's coef_ attribute<br>

Message-ID:<br>

        <<a href="mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@mail.gmail.com">CADeotZricOQhuHJMmW2Z14cqffEQ<wbr>yndYoxn-OgKAvTMQ7V0Y2g@mail.<wbr>gmail.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

no it stores the direction of the decision function to match the API of<br>

linear models.<br>

<br>

HTH<br>

Alex<br>

<br>

On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>> wrote:<br>

> Dear Scikit-learn community,<br>

><br>

> Since the documentation of the LDA<br>

> (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>

> is not so clear, I would like to ask if the lda.coef_ attribute stores the<br>

> eigenvectors from the SVD decomposition.<br>

><br>

> Thank you in advance,<br>

> Serafeim<br>

><br>

> ______________________________<wbr>_________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

><br>

<br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Mon, 16 Oct 2017 17:02:46 +0200<br>

From: Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>><br>

To: Scikit-learn mailing list <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>><br>

Subject: Re: [scikit-learn] Question about LDA's coef_ attribute<br>

Message-ID: <<a href="mailto:413210D2-56AE-41A4-873F-D171BB36539D@gmail.com">413210D2-56AE-41A4-873F-<wbr>D171BB36539D@gmail.com</a>><br>

Content-Type: text/plain; charset="us-ascii"<br>

<br>

Dear Alex,<br>

<br>

Thank you for the prompt response.<br>

<br>

Are the eigenvectors stored in some variable ?<br>

Does the lda.scalings_ attribute contain the eigenvectors ?<br>

<br>

Best,<br>

Serafeim<br>

<br>

> On 16 Oct 2017, at 16:57, Alexandre Gramfort <<a href="mailto:alexandre.gramfort@inria.fr">alexandre.gramfort@inria.fr</a>> wrote:<br>

><br>

> no it stores the direction of the decision function to match the API of<br>

> linear models.<br>

><br>

> HTH<br>

> Alex<br>

><br>

> On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a href="mailto:seralouk@gmail.com">seralouk@gmail.com</a>> wrote:<br>

>> Dear Scikit-learn community,<br>

>><br>

>> Since the documentation of the LDA<br>

>> (<a href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html" rel="noreferrer" target="_blank">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>

>> is not so clear, I would like to ask if the lda.coef_ attribute stores the<br>

>> eigenvectors from the SVD decomposition.<br>

>><br>

>> Thank you in advance,<br>

>> Serafeim<br>

>><br>

>> ______________________________<wbr>_________________<br>

>> scikit-learn mailing list<br>

>> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

>><br>

> ______________________________<wbr>_________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html" rel="noreferrer" target="_blank">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/505c7da3/<wbr>attachment.html</a>><br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

<br>

<br>

------------------------------<br>

<br>

End of scikit-learn Digest, Vol 19, Issue 25<br>

******************************<wbr>**************<br>

</blockquote></div><br></div></div>