[scikit-learn] Should we standardize data before PCA?

Shiheng Duan shiduan at ucdavis.edu
Sun May 27 01:10:07 EDT 2018


Thanks.

Do you mean that if feature one has a larger derivation than feature two,
after zscore they will have the same weight? In that case, it is a bias,
right? The feature one should be more important than feature two in the
PCA.

On Thu, May 24, 2018 at 5:09 PM, Michael Eickenberg <
michael.eickenberg at gmail.com> wrote:

> Hi,
>
> that totally depends on the nature of your data and whether the standard
> deviation of individual feature axes/columns of your data carry some form
> of importance measure. Note that PCA will bias its loadings towards columns
> with large standard deviations all else being held equal (meaning that if
> you have zscored columns, and then you choose one column and multiply it
> by, say 1000, then that component will likely show up as your first
> component [if 1000 is comparable or large wrt the number of features you
> are using])
>
> Does this help?
> Michael
>
> On Thu, May 24, 2018 at 4:39 PM, Shiheng Duan <shiduan at ucdavis.edu> wrote:
>
>> Hello all,
>>
>> I wonder is it necessary or correct to do z score transformation before
>> PCA? I didn't see any preprocessing for face image in the example of Faces
>> recognition example using eigenfaces and SVMs, link:
>> http://scikit-learn.org/stable/auto_examples/applicatio
>> ns/plot_face_recognition.html#sphx-glr-auto-examples-
>> applications-plot-face-recognition-py
>>
>> I am doing on a similar dataset and got a weird result if I standardized
>> data before PCA. The components figure will have a strong gradient and it
>> doesn't make any sense. Any ideas about the reason?
>>
>> Thanks.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180526/f67ceb10/attachment.html>


More information about the scikit-learn mailing list