<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    The definition of PCA has a centering step, but no scaling step.<br>

    <br>

    <div class="moz-cite-prefix">On 10/16/2017 11:16 AM, Ismael Lemhadri

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CANpSPFQs-zfUeggZeMsN6NdLGv512MDCW2ZqB8cgK1hFFdfHfw@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">Dear Roman,</div>

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">My concern is

          actually not about not mentioning the scaling but about not

          mentioning the centering.</div>

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">That is, the

          sklearn PCA removes the mean but it does not mention it in the

          help file.</div>

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">This was quite

          messy for me to debug as I expected it to either: 1/ center

          and scale simultaneously or / not scale and not center either.</div>

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">It would be

          beneficial to explicit the behavior in the help file in my

          opinion.</div>

        <div class="gmail_default"

          style="font-family:arial,helvetica,sans-serif">Ismael</div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Mon, Oct 16, 2017 at 8:02 AM, <span

              dir="ltr"><<a

                href="mailto:scikit-learn-request@python.org"

                target="_blank" moz-do-not-send="true">scikit-learn-request@python.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">Send

              scikit-learn mailing list submissions to<br>

                      <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              <br>

              To subscribe or unsubscribe via the World Wide Web, visit<br>

                      <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              or, via email, send a message with subject or body 'help'

              to<br>

                      <a href="mailto:scikit-learn-request@python.org"

                moz-do-not-send="true">scikit-learn-request@python.<wbr>org</a><br>

              <br>

              You can reach the person managing the list at<br>

                      <a href="mailto:scikit-learn-owner@python.org"

                moz-do-not-send="true">scikit-learn-owner@python.org</a><br>

              <br>

              When replying, please edit your Subject line so it is more

              specific<br>

              than "Re: Contents of scikit-learn digest..."<br>

              <br>

              <br>

              Today's Topics:<br>

              <br>

                 1. unclear help file for sklearn.decomposition.pca

              (Ismael Lemhadri)<br>

                 2. Re: unclear help file for sklearn.decomposition.pca<br>

                    (Roman Yurchak)<br>

                 3. Question about LDA's coef_ attribute (Serafeim

              Loukas)<br>

                 4. Re: Question about LDA's coef_ attribute (Alexandre

              Gramfort)<br>

                 5. Re: Question about LDA's coef_ attribute (Serafeim

              Loukas)<br>

              <br>

              <br>

              ------------------------------<wbr>------------------------------<wbr>----------<br>

              <br>

              Message: 1<br>

              Date: Sun, 15 Oct 2017 18:42:56 -0700<br>

              From: Ismael Lemhadri <<a

                href="mailto:lemhadri@stanford.edu"

                moz-do-not-send="true">lemhadri@stanford.edu</a>><br>

              To: <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              Subject: [scikit-learn] unclear help file for<br>

                      sklearn.decomposition.pca<br>

              Message-ID:<br>

                      <CANpSPFTgv+<wbr>Oz7f97dandmrBBayqf_o9w=<a

                href="mailto:18oKHCFN0u5DNzj%2Bg@mail.gmail.com"

                moz-do-not-send="true">18oKHCF<wbr>N0u5DNzj+g@mail.gmail.com</a>><br>

              Content-Type: text/plain; charset="utf-8"<br>

              <br>

              Dear all,<br>

              The help file for the PCA class is unclear about the

              preprocessing<br>

              performed to the data.<br>

              You can check on line 410 here:<br>

              <a

href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<br>

                decomposition/pca.py#L410</a><br>

              that the matrix is centered but NOT scaled, before

              performing the singular<br>

              value decomposition.<br>

              However, the help files do not make any mention of it.<br>

              This is unclear for someone who, like me, just wanted to

              compare that the<br>

              PCA and np.linalg.svd give the same results. In academic

              settings, students<br>

              are often asked to compare different methods and to check

              that they yield<br>

              the same results. I expect that many students have

              confronted this problem<br>

              before...<br>

              Best,<br>

              Ismael Lemhadri<br>

              -------------- next part --------------<br>

              An HTML attachment was scrubbed...<br>

              URL: <<a

href="http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171015/c465bde7/<wbr>attachment-0001.html</a>><br>

              <br>

              ------------------------------<br>

              <br>

              Message: 2<br>

              Date: Mon, 16 Oct 2017 15:16:45 +0200<br>

              From: Roman Yurchak <<a

                href="mailto:rth.yurchak@gmail.com"

                moz-do-not-send="true">rth.yurchak@gmail.com</a>><br>

              To: Scikit-learn mailing list <<a

                href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a>><br>

              Subject: Re: [scikit-learn] unclear help file for<br>

                      sklearn.decomposition.pca<br>

              Message-ID: <<a

                href="mailto:b2abdcfd-4736-929e-6304-b93832932043@gmail.com"

                moz-do-not-send="true">b2abdcfd-4736-929e-6304-<wbr>b93832932043@gmail.com</a>><br>

              Content-Type: text/plain; charset=utf-8; format=flowed<br>

              <br>

              Ismael,<br>

              <br>

              as far as I saw the sklearn.decomposition.PCA doesn't

              mention scaling at<br>

              all (except for the whiten parameter which is

              post-transformation scaling).<br>

              <br>

              So since it doesn't mention it, it makes sense that it

              doesn't do any<br>

              scaling of the input. Same as np.linalg.svd.<br>

              <br>

              You can verify that PCA and np.linalg.svd yield the same

              results, with<br>

              <br>

              ```<br>

               >>> import numpy as np<br>

               >>> from sklearn.decomposition import PCA<br>

               >>> import numpy.linalg<br>

               >>> X = np.random.RandomState(42).<wbr>rand(10,

              4)<br>

               >>> n_components = 2<br>

               >>> PCA(n_components, svd_solver='full').fit_<wbr>transform(X)<br>

              ```<br>

              <br>

              and<br>

              <br>

              ```<br>

               >>> U, s, V = np.linalg.svd(X - X.mean(axis=0),

              full_matrices=False)<br>

               >>> (X - X.mean(axis=0)).dot(V[:n_<wbr>components].T)<br>

              ```<br>

              <br>

              --<br>

              Roman<br>

              <br>

              On 16/10/17 03:42, Ismael Lemhadri wrote:<br>

              > Dear all,<br>

              > The help file for the PCA class is unclear about the

              preprocessing<br>

              > performed to the data.<br>

              > You can check on line 410 here:<br>

              > <a

href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a><br>

              > <<a

href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a>><br>

              > that the matrix is centered but NOT scaled, before

              performing the<br>

              > singular value decomposition.<br>

              > However, the help files do not make any mention of

              it.<br>

              > This is unclear for someone who, like me, just wanted

              to compare that<br>

              > the PCA and np.linalg.svd give the same results. In

              academic settings,<br>

              > students are often asked to compare different methods

              and to check that<br>

              > they yield the same results. I expect that many

              students have confronted<br>

              > this problem before...<br>

              > Best,<br>

              > Ismael Lemhadri<br>

              ><br>

              ><br>

              > ______________________________<wbr>_________________<br>

              > scikit-learn mailing list<br>

              > <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              > <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              ><br>

              <br>

              <br>

              <br>

              ------------------------------<br>

              <br>

              Message: 3<br>

              Date: Mon, 16 Oct 2017 15:27:48 +0200<br>

              From: Serafeim Loukas <<a

                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>

              To: <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              Subject: [scikit-learn] Question about LDA's coef_

              attribute<br>

              Message-ID: <<a

                href="mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9@gmail.com"

                moz-do-not-send="true">58C6D0DA-9DE5-4EF5-97C1-<wbr>48159831F5A9@gmail.com</a>><br>

              Content-Type: text/plain; charset="us-ascii"<br>

              <br>

              Dear Scikit-learn community,<br>

              <br>

              Since the documentation of the LDA (<a

href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>

              <<a

href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>>)

              is not so clear, I would like to ask if the lda.coef_

              attribute stores the eigenvectors from the SVD

              decomposition.<br>

              <br>

              Thank you in advance,<br>

              Serafeim<br>

              -------------- next part --------------<br>

              An HTML attachment was scrubbed...<br>

              URL: <<a

href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/4263df5c/<wbr>attachment-0001.html</a>><br>

              <br>

              ------------------------------<br>

              <br>

              Message: 4<br>

              Date: Mon, 16 Oct 2017 16:57:52 +0200<br>

              From: Alexandre Gramfort <<a

                href="mailto:alexandre.gramfort@inria.fr"

                moz-do-not-send="true">alexandre.gramfort@inria.fr</a>><br>

              To: Scikit-learn mailing list <<a

                href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a>><br>

              Subject: Re: [scikit-learn] Question about LDA's coef_

              attribute<br>

              Message-ID:<br>

                      <<a

href="mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@mail.gmail.com"

                moz-do-not-send="true">CADeotZricOQhuHJMmW2Z14cqffEQ<wbr>yndYoxn-OgKAvTMQ7V0Y2g@mail.<wbr>gmail.com</a>><br>

              Content-Type: text/plain; charset="UTF-8"<br>

              <br>

              no it stores the direction of the decision function to

              match the API of<br>

              linear models.<br>

              <br>

              HTH<br>

              Alex<br>

              <br>

              On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a

                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>

              wrote:<br>

              > Dear Scikit-learn community,<br>

              ><br>

              > Since the documentation of the LDA<br>

              > (<a

href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>

              > is not so clear, I would like to ask if the lda.coef_

              attribute stores the<br>

              > eigenvectors from the SVD decomposition.<br>

              ><br>

              > Thank you in advance,<br>

              > Serafeim<br>

              ><br>

              > ______________________________<wbr>_________________<br>

              > scikit-learn mailing list<br>

              > <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              > <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              ><br>

              <br>

              <br>

              ------------------------------<br>

              <br>

              Message: 5<br>

              Date: Mon, 16 Oct 2017 17:02:46 +0200<br>

              From: Serafeim Loukas <<a

                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>

              To: Scikit-learn mailing list <<a

                href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a>><br>

              Subject: Re: [scikit-learn] Question about LDA's coef_

              attribute<br>

              Message-ID: <<a

                href="mailto:413210D2-56AE-41A4-873F-D171BB36539D@gmail.com"

                moz-do-not-send="true">413210D2-56AE-41A4-873F-<wbr>D171BB36539D@gmail.com</a>><br>

              Content-Type: text/plain; charset="us-ascii"<br>

              <br>

              Dear Alex,<br>

              <br>

              Thank you for the prompt response.<br>

              <br>

              Are the eigenvectors stored in some variable ?<br>

              Does the lda.scalings_ attribute contain the eigenvectors

              ?<br>

              <br>

              Best,<br>

              Serafeim<br>

              <br>

              > On 16 Oct 2017, at 16:57, Alexandre Gramfort <<a

                href="mailto:alexandre.gramfort@inria.fr"

                moz-do-not-send="true">alexandre.gramfort@inria.fr</a>>

              wrote:<br>

              ><br>

              > no it stores the direction of the decision function

              to match the API of<br>

              > linear models.<br>

              ><br>

              > HTH<br>

              > Alex<br>

              ><br>

              > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a

                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>

              wrote:<br>

              >> Dear Scikit-learn community,<br>

              >><br>

              >> Since the documentation of the LDA<br>

              >> (<a

href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>

              >> is not so clear, I would like to ask if the

              lda.coef_ attribute stores the<br>

              >> eigenvectors from the SVD decomposition.<br>

              >><br>

              >> Thank you in advance,<br>

              >> Serafeim<br>

              >><br>

              >> ______________________________<wbr>_________________<br>

              >> scikit-learn mailing list<br>

              >> <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              >> <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              >><br>

              > ______________________________<wbr>_________________<br>

              > scikit-learn mailing list<br>

              > <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              > <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              <br>

              -------------- next part --------------<br>

              An HTML attachment was scrubbed...<br>

              URL: <<a

href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/505c7da3/<wbr>attachment.html</a>><br>

              <br>

              ------------------------------<br>

              <br>

              Subject: Digest Footer<br>

              <br>

              ______________________________<wbr>_________________<br>

              scikit-learn mailing list<br>

              <a href="mailto:scikit-learn@python.org"

                moz-do-not-send="true">scikit-learn@python.org</a><br>

              <a

                href="https://mail.python.org/mailman/listinfo/scikit-learn"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

              <br>

              <br>

              ------------------------------<br>

              <br>

              End of scikit-learn Digest, Vol 19, Issue 25<br>

              ******************************<wbr>**************<br>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

scikit-learn mailing list

<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>