What is Dimension Reduction in Machine Learning (with Python Example)

Dimensionality reduction, or dimension reduction, is a machine learning data transformation technique used in unsupervised learning to bring data from a high-dimensional space into a low-dimensional space retaining the meaningful properties of the original data.

In a nutshell, dimension reduction means representing data using fewer predictor variables (features).

It is a major component in making more efficient machine learning algorithms.


Subscribe to my Newsletter


Why Dimension Reduction?

The goal of dimension reduction is to represent the data using fewer variables, by preserving as much of the structure (variance) of the data as possible. Dimensionality reductions helps to:

  • simplify the models to make an easier interpretation,
  • reduce modelling costs
  • reduce training times,
  • avoid the curse of dimensionality

How Dimension Reduction Works?

Dimension reduction can be separated into linear and non-linear approaches.

Two techniques can be used for dimensionality reduction: feature selection and feature extraction.

Feature Selection

Feature selection, or variable selection, is used to remove redundant or irrelevant features without losing too much information from the data.

Three techniques of feature selection are:

  • Wrapper methods: Train model on each feature to see which have the fewest mistakes. Computationally intensive.
  • Filter methods: Uses a fast-to-compute proxy measure to evaluate the quality of the model. Less intensive than wrapper methods. Information gain is an example of filter methods.
  • Embedded methods: CPerforms feature as part of the model construction process. e.g. LASSO regression, Elasticnet regularization, …

Feature Extraction

Feature extraction removes less informative features by finding patterns in data and using the patterns to return a compressed form of the data.

Linear feature extraction techniques

  • Principal component analysis (PCA)
  • Non-negative matrix factorization (NMF)
  • Linear discriminant analysis (LDA)

Non-linear feature extraction techniques

  • T-distributed stochastic neighbor embedding (t-SNE)
  • Generalized discriminant analysis (GDA)
  • Autoencoder
  • Kernel PCA

Linear feature extraction techniques

Principal component analysis (PCA)

Principal component analysis, or PCA, is the main linear technique for dimension reduction.

The linear mapping of the data to a lower-dimensional space is performed in a way that maximizes the variance of the data.

PCA assumes that features with low variance are irrelevant and features with high variance are informative. PCA models are difficult to interpret.

Non-negative matrix factorization (NMF)

Non-negative matrix factorization, or NMF, is a dimension reduction technique that combines the product of non-negative features into a single one.

Interpretability of the NMF model

NMF will decompose documents and images into common patterns. Because of the decomposition of the documents, the NMF models are easy to interpret.

When to use NMF dimension reduction?

Use NMF on non-negative features such as word frequency array, recommender systems, purchase history on e-commerce sites and computer vision.

Linear Discriminant Analysis (LDA)

Linear discriminant analysis, or LDA, is a linear dimensional reduction technique used in the preprocessing for machine learning.

It is a generalization of Fisher’s linear discriminant. It works similarly to PCA but focuses on maximizing the separability within two or more known categories.

Non-linear feature extraction techniques

T-distributed stochastic neighbor embedding (t-SNE)

T-distributed stochastic neighbour embedding, or t-SNE, is a non-linear dimension reduction technique that maps samples of multi-dimensional spaces to a 2-dimensional space by preserving the nearness of samples.

Generalized discriminant analysis (GDA)

GDA is one of the non-linear dimensionality reduction techniques that reduce dimensionality using kernel methods. It maximizes the ratio of between-class scatter to within-class scatter in a similar fashion as the support-vector machines (SVM) theory does.

Autoencoder

The autoencoder trains a network to ignore insignificant data (noise) for a set of data.

Interesting read: Autoencoders For Dimensionality Reduction

Kernel principal component analysis (Kernel PCA)

The Kernel principal component analysis is an extension of the PCA that uses the kernel methods (pattern analysis algorithms) to reduce the dimensionality of non-linear features by maximizing the variation of the data.

Dimension Reduction Example in Python with Scikit-Learn

In this Python example, we will see how we can use PCA to perform dimensionality reduction on the Iris dataset of the Scikit-learn library.

from sklearn import datasets
from sklearn.decomposition import PCA

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data  # Features

# Create a PCA (Principal Component Analysis) instance to reduce dimensions to 2
pca = PCA(n_components=2)

# Fit the PCA model to the data and transform it
X_reduced = pca.fit_transform(X)

# Print the original and reduced dimensions
print(f"Original dimensions: {X.shape}")
print(f"Reduced dimensions: {X_reduced.shape}")

As you can se with the output, we have reduced the dimensions from 4 to 2 principal components.

Original dimensions: (150, 4)
Reduced dimensions: (150, 2)

Conclusion

This concludes this article on dimensionality reduction techniques.

Enjoyed This Post?