How to Plot a 2D PCA Scatterplot (with Python Example)

As part of the series of tutorials on PCA with Python, we will learn how to plot a 2D PCA graph (scatter plot) on the Iris Dataset with Python, Scikit-learn and Matplotlib.

What is 2D PCA Scatter plot?

A 2D PCA (Principal Component Analysis) scatter plot is a PCA visualization that shows the distribution of data points in a two dimensional space after reducing a dataset to 2 PCA features.

How to Plot a 2D PCA Graph in Python?

To plot a 2D PCA scatter plot in Python, reduce the number of features to 2 principal components. After, use matplotlib to generate a two-dimensional scatterplot from the data.

Join the Newsletter

    Here are the detailed steps to plot a 2D PCA scatter plot in Python:

    1. Load the required Python Libraries
    2. Load your Dataset
    3. Scale and Reduce the Number of Features Using PCA
    4. Prepare the PCA DataFrame
    5. Plot the 2D Scatterplot with Seaborn’s lmplot

    1. Loading the Required Python Libraries

    import matplotlib.pyplot as plt 
    import pandas as pd 
    import seaborn as sns
    from sklearn import datasets
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    sns.set()
    

    2. Loading the Iris Dataset in Python

    To start, we load the Iris dataset in Python, do some preprocessing and use PCA to reduce the dataset to 3 features. To learn what this means, follow our tutorial on PCA with Python.

    # load features and targets separately
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    

    From this data, we will learn various ways to plot the 3D PCA graph with Python.

    3. Scale and Reduce the Number of Features Using PCA

    Next, scale the date before applying PCA, and select the n_component to be equal to 2.

    # Data Scaling
    x_scaled = StandardScaler().fit_transform(X)
    
    # Reduce from 4 to 2 features with PCA
    pca = PCA(n_components=2)
    
    # Fit and transform data
    pca_features = pca.fit_transform(x_scaled)
    

    4. Prepare the PCA DataFrame

    Next, we will create a PCA dataframe, using the principal component features and map the names to the target variables for better legibility.

    # Create dataframe
    pca_df = pd.DataFrame(
        data=pca_features, 
        columns=['PC1', 'PC2'])
    
    # map target names to PCA features   
    target_names = {
        0:'setosa',
        1:'versicolor', 
        2:'virginica'
    }
    
    pca_df['target'] = y
    pca_df['target'] = pca_df['target'].map(target_names)
    
    pca_df.head()
    

    5. Plot the 2D Scatterplot with Seaborn’s lmplot

    Finally, use seaborn’s lmplot function to plot the PCA dataframe into a two-dimensional scatter plot.

    sns.lmplot(
        x='PC1', 
        y='PC2', 
        data=pca_df, 
        hue='target', 
        fit_reg=False, 
        legend=True
        )
    
    plt.title('2D PCA Graph')
    plt.show()
    

    Next Steps

    After plotting a 2D PCA Scatterplot, it is interesting to learn how to plot a 3D PCA Scatterplot and how to plot a 2D PCA Biplot.

    Full Code

    import matplotlib.pyplot as plt 
    import pandas as pd 
    import seaborn as sns
    from sklearn import datasets
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    sns.set()
    
    # load features and targets separately
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    # Data Scaling
    x_scaled = StandardScaler().fit_transform(X)
    
    # Reduce from 4 to 2 features with PCA
    pca = PCA(n_components=2)
    
    # Fit and transform data
    pca_features = pca.fit_transform(x_scaled)
    
    # Create dataframe
    pca_df = pd.DataFrame(
        data=pca_features, 
        columns=['PC1', 'PC2'])
    
    # map target names to PCA features   
    target_names = {
        0:'setosa',
        1:'versicolor', 
        2:'virginica'
    }
    
    pca_df['target'] = y
    pca_df['target'] = pca_df['target'].map(target_names)
    
    # Plot the 2D PCA Scatterplot
    sns.lmplot(
        x='PC1', 
        y='PC2', 
        data=pca_df, 
        hue='target', 
        fit_reg=False, 
        legend=True
        )
    
    plt.title('2D PCA Graph')
    plt.show()
    
    Enjoyed This Post?