What is the Explained Variance in Principal Component Analysis?
The explained variance in Principal Component Analysis (PCA) represents the proportion of the total variance attributed (explained) by each principal component.
It helps us understand how much information is retained after dimensionality reduction. It is the portion of the original data’s variability that is captured by each principal component.
The larger the eigenvalue, the more important the corresponding eigenvector is in explaining the variance of the data.
Specifically, it is an array of values where each value equals the variance of each principal component and the length of the array is equal to the number of components defined with
What is a Eigenvector in PCA
The eigenvector in PCA is a unit vector of the transformation matrix of the length equal to 1 that represents the direction of the principal component.
What is an Eigenvalue in PCA
The eigenvalue is the coefficient applied to the eigenvector showing the variance that can be attributed to each of the principal components and giving the eigenvectors their length. The larger the eigenvalue, the more important the corresponding eigenvector in explaining the variance of the data.
An eigenvalue is an array of values where each value that equals the variance of each principal component.
What is the Difference Between the Explained Variance and the Eigenvalue?
The eigenvalue and the explained variance in Principal Component Analysis (PCA) are related concepts and often used as synonyms, they are not exactly the same.
Eigenvalues indicate the variance along each principal component. Explained variance is the proportion of total dataset variance captured by each principal component.
|Variance along each component||Proportion of total dataset variance|
|Larger eigenvalues capture more variance||Expressed as a percentage|
How to Plot the Feature Explained Variance in Python?
We can plot the PCA explained variance to see the variance of each principal component feature.
import pandas as pd from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt import seaborn as sns sns.set() # load features and targets separately iris = datasets.load_iris() X = iris.data y = iris.target # Data Scaling x_scaled = StandardScaler().fit_transform(X) # Reduce from 4 to 3 features with PCA pca = PCA(n_components=3) # Fit and transform data pca_features = pca.fit_transform(x_scaled) # Bar plot of explained_variance plt.bar( range(1,len(pca.explained_variance_)+1), pca.explained_variance_ ) plt.xlabel('PCA Feature') plt.ylabel('Explained variance') plt.title('Feature Explained Variance') plt.show()
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.