Principal Component Analysis


What's that?

  • given some data you transform the data to a new coordinate system
  • such that the greatest variance comes to lie on the 1st coordinate axis
  • the second greatest variance comes to lie on the 2nd coordinate axis
  • and so on

Why is that helpful?

  • if your data points are N dimensional and you use the new PCA N dimensional representation, you have the data points described in a coordinate system that better fits to the distribution of your data in the N dimensional space
  • if you desribe the data using only the first M « N dimensions, i.e., its projection on the first M principal axes, you compress your data!

How to compute it?

But how to compute the covariance matrix?

Well, for a data matrix X actually we don't use the real covariance matrix C:

C_ij = E[ (X_i - mean_i) (X_j - meanj) ]

but the sample covariance matrix.


Video #1

Nicely explained by Prof. Alexander Ihler what the PCA is good for, how to compute it, and how to use it using the example of Eigen-Face representations of arbitrary faces.

BTW: Alexander Ihler has also more machine learning related video tutorials on his youtube channel

Video #2

public/principal_component_analysis_pca.txt · Last modified: 2014/01/11 13:21 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki