[scikit-learn] unclear help file for sklearn.decomposition.pca
Roman Yurchak
rth.yurchak at gmail.com
Mon Oct 16 09:16:45 EDT 2017
Ismael,
as far as I saw the sklearn.decomposition.PCA doesn't mention scaling at
all (except for the whiten parameter which is post-transformation scaling).
So since it doesn't mention it, it makes sense that it doesn't do any
scaling of the input. Same as np.linalg.svd.
You can verify that PCA and np.linalg.svd yield the same results, with
```
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> import numpy.linalg
>>> X = np.random.RandomState(42).rand(10, 4)
>>> n_components = 2
>>> PCA(n_components, svd_solver='full').fit_transform(X)
```
and
```
>>> U, s, V = np.linalg.svd(X - X.mean(axis=0), full_matrices=False)
>>> (X - X.mean(axis=0)).dot(V[:n_components].T)
```
--
Roman
On 16/10/17 03:42, Ismael Lemhadri wrote:
> Dear all,
> The help file for the PCA class is unclear about the preprocessing
> performed to the data.
> You can check on line 410 here:
> https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410
> <https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/decomposition/pca.py#L410>
> that the matrix is centered but NOT scaled, before performing the
> singular value decomposition.
> However, the help files do not make any mention of it.
> This is unclear for someone who, like me, just wanted to compare that
> the PCA and np.linalg.svd give the same results. In academic settings,
> students are often asked to compare different methods and to check that
> they yield the same results. I expect that many students have confronted
> this problem before...
> Best,
> Ismael Lemhadri
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
More information about the scikit-learn
mailing list