[scikit-learn] Should we standardize data before PCA?

Michael Eickenberg michael.eickenberg at gmail.com
Thu May 24 20:09:52 EDT 2018


Hi,

that totally depends on the nature of your data and whether the standard
deviation of individual feature axes/columns of your data carry some form
of importance measure. Note that PCA will bias its loadings towards columns
with large standard deviations all else being held equal (meaning that if
you have zscored columns, and then you choose one column and multiply it
by, say 1000, then that component will likely show up as your first
component [if 1000 is comparable or large wrt the number of features you
are using])

Does this help?
Michael

On Thu, May 24, 2018 at 4:39 PM, Shiheng Duan <shiduan at ucdavis.edu> wrote:

> Hello all,
>
> I wonder is it necessary or correct to do z score transformation before
> PCA? I didn't see any preprocessing for face image in the example of Faces
> recognition example using eigenfaces and SVMs, link:
> http://scikit-learn.org/stable/auto_examples/applications/plot_face_
> recognition.html#sphx-glr-auto-examples-applications-
> plot-face-recognition-py
>
> I am doing on a similar dataset and got a weird result if I standardized
> data before PCA. The components figure will have a strong gradient and it
> doesn't make any sense. Any ideas about the reason?
>
> Thanks.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180524/2e362327/attachment.html>


More information about the scikit-learn mailing list