<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    The definition of PCA has a centering step, but no scaling step.<br>
    <br>
    <div class="moz-cite-prefix">On 10/16/2017 11:16 AM, Ismael Lemhadri
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CANpSPFQs-zfUeggZeMsN6NdLGv512MDCW2ZqB8cgK1hFFdfHfw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">Dear Roman,</div>
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">My concern is
          actually not about not mentioning the scaling but about not
          mentioning the centering.</div>
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">That is, the
          sklearn PCA removes the mean but it does not mention it in the
          help file.</div>
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">This was quite
          messy for me to debug as I expected it to either: 1/ center
          and scale simultaneously or / not scale and not center either.</div>
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">It would be
          beneficial to explicit the behavior in the help file in my
          opinion.</div>
        <div class="gmail_default"
          style="font-family:arial,helvetica,sans-serif">Ismael</div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Mon, Oct 16, 2017 at 8:02 AM, <span
              dir="ltr"><<a
                href="mailto:scikit-learn-request@python.org"
                target="_blank" moz-do-not-send="true">scikit-learn-request@python.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">Send
              scikit-learn mailing list submissions to<br>
                      <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              <br>
              To subscribe or unsubscribe via the World Wide Web, visit<br>
                      <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              or, via email, send a message with subject or body 'help'
              to<br>
                      <a href="mailto:scikit-learn-request@python.org"
                moz-do-not-send="true">scikit-learn-request@python.<wbr>org</a><br>
              <br>
              You can reach the person managing the list at<br>
                      <a href="mailto:scikit-learn-owner@python.org"
                moz-do-not-send="true">scikit-learn-owner@python.org</a><br>
              <br>
              When replying, please edit your Subject line so it is more
              specific<br>
              than "Re: Contents of scikit-learn digest..."<br>
              <br>
              <br>
              Today's Topics:<br>
              <br>
                 1. unclear help file for sklearn.decomposition.pca
              (Ismael Lemhadri)<br>
                 2. Re: unclear help file for sklearn.decomposition.pca<br>
                    (Roman Yurchak)<br>
                 3. Question about LDA's coef_ attribute (Serafeim
              Loukas)<br>
                 4. Re: Question about LDA's coef_ attribute (Alexandre
              Gramfort)<br>
                 5. Re: Question about LDA's coef_ attribute (Serafeim
              Loukas)<br>
              <br>
              <br>
              ------------------------------<wbr>------------------------------<wbr>----------<br>
              <br>
              Message: 1<br>
              Date: Sun, 15 Oct 2017 18:42:56 -0700<br>
              From: Ismael Lemhadri <<a
                href="mailto:lemhadri@stanford.edu"
                moz-do-not-send="true">lemhadri@stanford.edu</a>><br>
              To: <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              Subject: [scikit-learn] unclear help file for<br>
                      sklearn.decomposition.pca<br>
              Message-ID:<br>
                      <CANpSPFTgv+<wbr>Oz7f97dandmrBBayqf_o9w=<a
                href="mailto:18oKHCFN0u5DNzj%2Bg@mail.gmail.com"
                moz-do-not-send="true">18oKHCF<wbr>N0u5DNzj+g@mail.gmail.com</a>><br>
              Content-Type: text/plain; charset="utf-8"<br>
              <br>
              Dear all,<br>
              The help file for the PCA class is unclear about the
              preprocessing<br>
              performed to the data.<br>
              You can check on line 410 here:<br>
              <a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/%0Adecomposition/pca.py#L410"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<br>
                decomposition/pca.py#L410</a><br>
              that the matrix is centered but NOT scaled, before
              performing the singular<br>
              value decomposition.<br>
              However, the help files do not make any mention of it.<br>
              This is unclear for someone who, like me, just wanted to
              compare that the<br>
              PCA and np.linalg.svd give the same results. In academic
              settings, students<br>
              are often asked to compare different methods and to check
              that they yield<br>
              the same results. I expect that many students have
              confronted this problem<br>
              before...<br>
              Best,<br>
              Ismael Lemhadri<br>
              -------------- next part --------------<br>
              An HTML attachment was scrubbed...<br>
              URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171015/c465bde7/attachment-0001.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171015/c465bde7/<wbr>attachment-0001.html</a>><br>
              <br>
              ------------------------------<br>
              <br>
              Message: 2<br>
              Date: Mon, 16 Oct 2017 15:16:45 +0200<br>
              From: Roman Yurchak <<a
                href="mailto:rth.yurchak@gmail.com"
                moz-do-not-send="true">rth.yurchak@gmail.com</a>><br>
              To: Scikit-learn mailing list <<a
                href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a>><br>
              Subject: Re: [scikit-learn] unclear help file for<br>
                      sklearn.decomposition.pca<br>
              Message-ID: <<a
                href="mailto:b2abdcfd-4736-929e-6304-b93832932043@gmail.com"
                moz-do-not-send="true">b2abdcfd-4736-929e-6304-<wbr>b93832932043@gmail.com</a>><br>
              Content-Type: text/plain; charset=utf-8; format=flowed<br>
              <br>
              Ismael,<br>
              <br>
              as far as I saw the sklearn.decomposition.PCA doesn't
              mention scaling at<br>
              all (except for the whiten parameter which is
              post-transformation scaling).<br>
              <br>
              So since it doesn't mention it, it makes sense that it
              doesn't do any<br>
              scaling of the input. Same as np.linalg.svd.<br>
              <br>
              You can verify that PCA and np.linalg.svd yield the same
              results, with<br>
              <br>
              ```<br>
               >>> import numpy as np<br>
               >>> from sklearn.decomposition import PCA<br>
               >>> import numpy.linalg<br>
               >>> X = np.random.RandomState(42).<wbr>rand(10,
              4)<br>
               >>> n_components = 2<br>
               >>> PCA(n_components, svd_solver='full').fit_<wbr>transform(X)<br>
              ```<br>
              <br>
              and<br>
              <br>
              ```<br>
               >>> U, s, V = np.linalg.svd(X - X.mean(axis=0),
              full_matrices=False)<br>
               >>> (X - X.mean(axis=0)).dot(V[:n_<wbr>components].T)<br>
              ```<br>
              <br>
              --<br>
              Roman<br>
              <br>
              On 16/10/17 03:42, Ismael Lemhadri wrote:<br>
              > Dear all,<br>
              > The help file for the PCA class is unclear about the
              preprocessing<br>
              > performed to the data.<br>
              > You can check on line 410 here:<br>
              > <a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a><br>
              > <<a
href="https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/scikit-<wbr>learn/scikit-learn/blob/<wbr>ef5cb84a/sklearn/<wbr>decomposition/pca.py#L410</a>><br>
              > that the matrix is centered but NOT scaled, before
              performing the<br>
              > singular value decomposition.<br>
              > However, the help files do not make any mention of
              it.<br>
              > This is unclear for someone who, like me, just wanted
              to compare that<br>
              > the PCA and np.linalg.svd give the same results. In
              academic settings,<br>
              > students are often asked to compare different methods
              and to check that<br>
              > they yield the same results. I expect that many
              students have confronted<br>
              > this problem before...<br>
              > Best,<br>
              > Ismael Lemhadri<br>
              ><br>
              ><br>
              > ______________________________<wbr>_________________<br>
              > scikit-learn mailing list<br>
              > <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              > <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              ><br>
              <br>
              <br>
              <br>
              ------------------------------<br>
              <br>
              Message: 3<br>
              Date: Mon, 16 Oct 2017 15:27:48 +0200<br>
              From: Serafeim Loukas <<a
                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>
              To: <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              Subject: [scikit-learn] Question about LDA's coef_
              attribute<br>
              Message-ID: <<a
                href="mailto:58C6D0DA-9DE5-4EF5-97C1-48159831F5A9@gmail.com"
                moz-do-not-send="true">58C6D0DA-9DE5-4EF5-97C1-<wbr>48159831F5A9@gmail.com</a>><br>
              Content-Type: text/plain; charset="us-ascii"<br>
              <br>
              Dear Scikit-learn community,<br>
              <br>
              Since the documentation of the LDA (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>
              <<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>>)
              is not so clear, I would like to ask if the lda.coef_
              attribute stores the eigenvectors from the SVD
              decomposition.<br>
              <br>
              Thank you in advance,<br>
              Serafeim<br>
              -------------- next part --------------<br>
              An HTML attachment was scrubbed...<br>
              URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/4263df5c/attachment-0001.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/4263df5c/<wbr>attachment-0001.html</a>><br>
              <br>
              ------------------------------<br>
              <br>
              Message: 4<br>
              Date: Mon, 16 Oct 2017 16:57:52 +0200<br>
              From: Alexandre Gramfort <<a
                href="mailto:alexandre.gramfort@inria.fr"
                moz-do-not-send="true">alexandre.gramfort@inria.fr</a>><br>
              To: Scikit-learn mailing list <<a
                href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a>><br>
              Subject: Re: [scikit-learn] Question about LDA's coef_
              attribute<br>
              Message-ID:<br>
                      <<a
href="mailto:CADeotZricOQhuHJMmW2Z14cqffEQyndYoxn-OgKAvTMQ7V0Y2g@mail.gmail.com"
                moz-do-not-send="true">CADeotZricOQhuHJMmW2Z14cqffEQ<wbr>yndYoxn-OgKAvTMQ7V0Y2g@mail.<wbr>gmail.com</a>><br>
              Content-Type: text/plain; charset="UTF-8"<br>
              <br>
              no it stores the direction of the decision function to
              match the API of<br>
              linear models.<br>
              <br>
              HTH<br>
              Alex<br>
              <br>
              On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a
                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>
              wrote:<br>
              > Dear Scikit-learn community,<br>
              ><br>
              > Since the documentation of the LDA<br>
              > (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
              > is not so clear, I would like to ask if the lda.coef_
              attribute stores the<br>
              > eigenvectors from the SVD decomposition.<br>
              ><br>
              > Thank you in advance,<br>
              > Serafeim<br>
              ><br>
              > ______________________________<wbr>_________________<br>
              > scikit-learn mailing list<br>
              > <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              > <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              ><br>
              <br>
              <br>
              ------------------------------<br>
              <br>
              Message: 5<br>
              Date: Mon, 16 Oct 2017 17:02:46 +0200<br>
              From: Serafeim Loukas <<a
                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>><br>
              To: Scikit-learn mailing list <<a
                href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a>><br>
              Subject: Re: [scikit-learn] Question about LDA's coef_
              attribute<br>
              Message-ID: <<a
                href="mailto:413210D2-56AE-41A4-873F-D171BB36539D@gmail.com"
                moz-do-not-send="true">413210D2-56AE-41A4-873F-<wbr>D171BB36539D@gmail.com</a>><br>
              Content-Type: text/plain; charset="us-ascii"<br>
              <br>
              Dear Alex,<br>
              <br>
              Thank you for the prompt response.<br>
              <br>
              Are the eigenvectors stored in some variable ?<br>
              Does the lda.scalings_ attribute contain the eigenvectors
              ?<br>
              <br>
              Best,<br>
              Serafeim<br>
              <br>
              > On 16 Oct 2017, at 16:57, Alexandre Gramfort <<a
                href="mailto:alexandre.gramfort@inria.fr"
                moz-do-not-send="true">alexandre.gramfort@inria.fr</a>>
              wrote:<br>
              ><br>
              > no it stores the direction of the decision function
              to match the API of<br>
              > linear models.<br>
              ><br>
              > HTH<br>
              > Alex<br>
              ><br>
              > On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas <<a
                href="mailto:seralouk@gmail.com" moz-do-not-send="true">seralouk@gmail.com</a>>
              wrote:<br>
              >> Dear Scikit-learn community,<br>
              >><br>
              >> Since the documentation of the LDA<br>
              >> (<a
href="http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://scikit-learn.org/<wbr>stable/modules/generated/<wbr>sklearn.discriminant_analysis.<wbr>LinearDiscriminantAnalysis.<wbr>html</a>)<br>
              >> is not so clear, I would like to ask if the
              lda.coef_ attribute stores the<br>
              >> eigenvectors from the SVD decomposition.<br>
              >><br>
              >> Thank you in advance,<br>
              >> Serafeim<br>
              >><br>
              >> ______________________________<wbr>_________________<br>
              >> scikit-learn mailing list<br>
              >> <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              >> <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              >><br>
              > ______________________________<wbr>_________________<br>
              > scikit-learn mailing list<br>
              > <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              > <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              <br>
              -------------- next part --------------<br>
              An HTML attachment was scrubbed...<br>
              URL: <<a
href="http://mail.python.org/pipermail/scikit-learn/attachments/20171016/505c7da3/attachment.html"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://mail.python.org/<wbr>pipermail/scikit-learn/<wbr>attachments/20171016/505c7da3/<wbr>attachment.html</a>><br>
              <br>
              ------------------------------<br>
              <br>
              Subject: Digest Footer<br>
              <br>
              ______________________________<wbr>_________________<br>
              scikit-learn mailing list<br>
              <a href="mailto:scikit-learn@python.org"
                moz-do-not-send="true">scikit-learn@python.org</a><br>
              <a
                href="https://mail.python.org/mailman/listinfo/scikit-learn"
                rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
              <br>
              <br>
              ------------------------------<br>
              <br>
              End of scikit-learn Digest, Vol 19, Issue 25<br>
              ******************************<wbr>**************<br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>