Feb 19, 2019       tags: | math notes | visualisation |

Covariance matrix and principal component analysis — an intuitive linear algebra approach

>>> Click here if you want to skip to the interactive visualization of PCA

Covariance is one subject that gets exhaustive formal treatment in statistical textbooks yet is often left vague on the intuitive level. In my previous post, I compared sample covariance and the dot product of zero-centered observations of random variables. Now I would like to discuss a powerful generalization of covariance — the covariance matrix. The entries of this matrix consist of covariances between each combination of random variables and :

The diagonal entries () are, of course, just variances of .

For illustrative purposes, we will consider a simplistic case of observations of zero-centered random variables such that each covariance matrix entry simplifies to

where and are vectors of observations of random variables and respectively. We are going to assume that is sufficiently large so that sample means are close enough to zero.

One more thing we need to agree on is how to represent our sample data. Let and be rows in the following data matrix:

Then each column of this matrix will represent one set of simultaneous observations of all random variables. The neat thing about this matrix representation is that we can take this matrix and multiply it by its transpose times , and that gives us the covariance matrix:

Of course, there is nothing mysterious about this result — it’s just the way matrix multiplication works (“rows dot columns”). However, this representation of the data and its covariance matrix will make it much easier to demonstrate the following point:

*** The eigenvectors of a covariance matrix represent the basis in which the data is uncorrelated. ***

Let’s start by asking the following question: if there is a linear transformation (specifically, a rotation ) that transforms our data matrix in a way that the resulting matrix has no correlation between it’s rows, how can we find this tranformation ? Let’s recollect that in a covariance matrix, all non-zero non-diagonal components indicate correlation. Therefore, if there is no correlation after we apply the transformation to our data, the new covariance matrix has to be diagonal:

One property of rotation matrices is that their transpose is the same as the inverse matrix. Hence

Note that is the covariance matrix of the original data

Since is a symmetric matrix, it can be eigen-decomposed as , where is the matrix whose columns are eigenvectors of , and is the diagonal matrix whose entries are eigenvalues of .

Let’s substitute for ,

Since both and are diagonal, we must conclude that and are inverses of each other. Therefore representing the data matrix in the basis of is equivalent to applying a transformation (rotation) that removes correlation between variables.

Removing correlation is the goal of principal component analysis (PCA), therefore covariance matrix eigenvectors can be called principal components. The following interactive demonstration (powered by vtvt) shows how principal components are affected by the distribution of data points. Try arranging the points into a parabola shape and note what happens — this is because covariance/correlation are measures of collinearity, and non-linear relationships between random variables are cannot be captured by them properly.


PCA is not a type of regression. It doesn’t model anything. It merely presents your data in a different basis which can help you identify and discard the least important principal components. If you perform OLS regression and plot the line, you’ll find that it won’t be parallel to any of the eigenvectors, and that’s just how it’s supposed to be.

Suggested post:

The real reason you use the MSE and cross-entropy loss functions
If you learned machine learning from MOOCs, there's a good chance you haven't been taught the true significance of the mean squared error and cross-entropy loss functions.