# [Tutor] Help- Regarding python

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Feb 5 01:21:24 CET 2013

```On 4 February 2013 06:24, Gayathri S <gayathri.s112 at gmail.com> wrote:
> Hi All....!
>             If i have data set like this means...
>
> 3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
> 438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
> 439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
> 440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
> 444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
> 451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
> 458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496.

PCA only makes sense for multivariate data: your data should be a set
of vectors *all of the same length*. I'll assume that you were just
being lazy when you posted it and that you didn't bother to copy the
first and last lines properly...

[snip]
>
> Shall i use the following code for doing PCA on given input? could you tell
> me?

This code you posted is all screwed up. It will give you errors if you
try to run it.

Also I don't really know what you mean by "doing PCA". The code below
transforms your data into PCA space and plots a 2D scatter plot using
the first two principal components.

#!/usr/bin/env python
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
[438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10],
[439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10],
[440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10],
[444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10],
[451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20],
])

# Compute the eigenvalues and vectors of the covariance matrix
C = np.cov(data.T)
eigenvalues, eigenvectors = np.linalg.eig(C)

# 2D PCA - get the two eigenvectors with the largest eigenvalues
v1, v2 = eigenvectors[:,:2].T
# Project the data onto the two principal components
data_pc1 = [np.dot(v1, d) for d in data]
data_pc2 = [np.dot(v2, d) for d in data]

# Scatter plot in PCA space
fig = plt.figure()