What is the Explained Variance in PCA

As part of the series of tutorials on PCA with Python, we will learn what the explained variance is and what it means in Principal Component Analysis.

What is the Explained Variance in Principal Component Analysis?

The explained variance in Principal Component Analysis (PCA) represents the proportion of the total variance attributed (explained) by each principal component.

It helps us understand how much information is retained after dimensionality reduction. It is the portion of the original data’s variability that is captured by each principal component.

The larger the eigenvalue, the more important the corresponding eigenvector is in explaining the variance of the data.

Specifically, it is an array of values where each value equals the variance of each principal component and the length of the array is equal to the number of components defined with `n_components`.

What is a Eigenvector in PCA

The eigenvector in PCA is a unit vector of the transformation matrix of the length equal to 1 that represents the direction of the principal component.

What is an Eigenvalue in PCA

The eigenvalue is the coefficient applied to the eigenvector showing the variance that can be attributed to each of the principal components and giving the eigenvectors their length. The larger the eigenvalue, the more important the corresponding eigenvector in explaining the variance of the data.

An eigenvalue is an array of values where each value that equals the variance of each principal component.

What is the Difference Between the Explained Variance and the Eigenvalue?

The eigenvalue and the explained variance in Principal Component Analysis (PCA) are related concepts and often used as synonyms, they are not exactly the same.

Eigenvalues indicate the variance along each principal component. Explained variance is the proportion of total dataset variance captured by each principal component.

How to Plot the Feature Explained Variance in Python?

We can plot the PCA explained variance to see the variance of each principal component feature.

```import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# load features and targets separately
X = iris.data
y = iris.target

# Data Scaling
x_scaled = StandardScaler().fit_transform(X)

# Reduce from 4 to 3 features with PCA
pca = PCA(n_components=3)

# Fit and transform data
pca_features = pca.fit_transform(x_scaled)

# Bar plot of explained_variance
plt.bar(
range(1,len(pca.explained_variance_)+1),
pca.explained_variance_
)

plt.xlabel('PCA Feature')
plt.ylabel('Explained variance')
plt.title('Feature Explained Variance')
plt.show()
```
Enjoyed This Post?