# What is the Explained Variance in PCA (Python Example)

As part of the series of tutorials on Python PCA, we will learn what the explained variance is and what it means in Principal Component Analysis.

## What is the Explained Variance in Principal Component Analysis?

The explained variance in Principal Component Analysis (PCA) represents the proportion of the total variance attributed (explained) by each principal component.

It helps us understand how much information is retained after dimensionality reduction. It is the portion of the original data’s variability that is captured by each principal component.

The larger the eigenvalue, the more important the corresponding eigenvector is in explaining the variance of the data.

Specifically, it is an array of values where each value equals the variance of each principal component and the length of the array is equal to the number of components defined with `n_components`.

## Explained Variance in Python

In PCA, the explained variance is accessed using the `explained_variance_` attribute of the `pca` object.

``pca.explained_variance_``

In this Python example, we load the iris dataset, scale its features and apply PCA to reduce the original dataset to two dimensions. Then, we train and transform the object and finally show the explained variance.

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

X = iris.data
y = iris.target

# Standardize the data
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Apply PCA with two components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_standardized)

explained_variance = pca.explained_variance_
explained_variance
```

The result is an explained variance, expressed as an array with two values.

``array([2.93808505, 0.9201649 ])``

## Interpret the Explained Variance in PCA

The explained variance array is composed of absolute values. The greater the value, the more it contributes to the variance of the Principal Components. In the above, the PC1 contributes to 2.93 units of variance in the original dataset. The PC2, contributes to 0.92 units.

To make it more useful, we generally use the explained variance ratio, that gives the ratio of each explained variance to the cumulative explained variance.

``cumulative explained variance = 2.93808505 + 0.9201649``

Here the explained variance ratio is accessed using the `pca.explained_variance_ratio_` attribute.

```pca.explained_variance_ratio_
```
``array([0.72962445, 0.22850762])``

Now, we can see that the PC1 contributes to 73% of the variance, and PC2 to 23% of the variance, which sums up to 96% of the variance in the data is explained by these two Principal Components. The remaining 4% is what was “discarded” when reducing dimensions.

```pd.DataFrame({
'Explained Variance': pca.explained_variance_,
'Explained Variance Ratio': pca.explained_variance_ratio_,
}, index=['PC1', 'PC2'])
```

## How to Plot the Feature Explained Variance in Python?

We can plot the PCA explained variance to see the variance of each principal component feature.

```import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# load features and targets separately
X = iris.data
y = iris.target

# Data Scaling
x_scaled = StandardScaler().fit_transform(X)

# Reduce from 4 to 3 features with PCA
pca = PCA(n_components=3)

# Fit and transform data
pca_features = pca.fit_transform(x_scaled)

# Bar plot of explained_variance
plt.bar(
range(1,len(pca.explained_variance_)+1),
pca.explained_variance_
)

plt.xlabel('PCA Feature')
plt.ylabel('Explained variance')
plt.title('Feature Explained Variance')
plt.show()
```

## What is the Difference Between the Explained Variance and the Eigenvalue?

The eigenvalue and the explained variance in Principal Component Analysis (PCA) are related concepts and often used as synonyms, they are not exactly the same.

Eigenvalues indicate the variance along each principal component. Explained variance is the proportion of total dataset variance captured by each principal component.

### What is a Eigenvector in PCA

The eigenvector in PCA is a unit vector of the transformation matrix of the length equal to 1 that represents the direction of the principal component.

### What is an Eigenvalue in PCA

The eigenvalue is the coefficient applied to the eigenvector showing the variance that can be attributed to each of the principal components and giving the eigenvectors their length. The larger the eigenvalue, the more important the corresponding eigenvector in explaining the variance of the data.

An eigenvalue is an array of values where each value that equals the variance of each principal component.

Enjoyed This Post?