[scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca
Andreas Mueller
t3kcit at gmail.com
Mon Oct 16 14:44:51 EDT 2017
On 10/16/2017 02:27 PM, Ismael Lemhadri wrote:
> @Andreas Muller:
> My references do not assume centering, e.g.
> http://ufldl.stanford.edu/wiki/index.php/PCA
> any reference?
>
It kinda does but is not very clear about it:
This data has already been pre-processed so that each of the
features\textstyle x_1and\textstyle x_2have about the same mean (zero)
and variance.
Wikipedia is much clearer:
Consider a datamatrix
<https://en.wikipedia.org/wiki/Matrix_%28mathematics%29>,*X*, with
column-wise zeroempirical mean
<https://en.wikipedia.org/wiki/Empirical_mean>(the sample mean of each
column has been shifted to zero), where each of the/n/rows represents a
different repetition of the experiment, and each of the/p/columns gives
a particular kind of feature (say, the results from a particular sensor).
https://en.wikipedia.org/wiki/Principal_component_analysis#Details
I'm a bit surprised to find that ESL says "The SVD of the centered
matrix X is another way of expressing the principal components of the
variables in X",
so they assume scaling? They don't really have a great treatment of PCA,
though.
Bishop <http://www.springer.com/us/book/9780387310732> and Murphy
<https://mitpress.mit.edu/books/machine-learning-0> are pretty clear
that they subtract the mean (or assume zero mean) but don't standardize.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/81b3014b/attachment.html>
More information about the scikit-learn
mailing list