What is Clustering in Machine Learning?
Clustering is used to group similar data points together based on their characteristics.
Clustering machine-learning algorithms are grouping similar elements in such a way that the distance between each element of the cluster are closer to each other than to any other cluster.
Example of Clustering Algorithms
Here are the 3 most popular clustering algorithms that we will cover in this article:
- Hierarchical Clustering
Below we show an overview of other Scikit-learn‘s clustering methods
Examples of clustering problems
- Recommender systems
- Semantic clustering
- Customer segmentation
- Targetted marketing
How do clustering algorithms work?
Each clustering algorithm works differently than the other, but the logic of KMeans and Hierarchical clustering is similar. Clustering machine learning algorithm work by:
- Selecting cluster centers
- Computing distances between data points to cluster centers, or between each cluster centers.
- Redefining cluster center based on the resulting distances.
- Repeating the process until the optimal clusters are reached
This is an overly simplified view of clustering, but we will dive deeper into how each algorithm works specifically in the next sections.
How does KMeans Clustering Work?
Kmeans clustering algorithm works by starting with a fixed set of clusters and moving the cluster centres until the optimal clustering is met.
- Defining a number of clusters at the start
- Selecting random cluster centers
- Computing distances between each point to cluster center
- Finding new cluster centers using the mean of distances
- Repeating until convergence.
Some examples of KMeans clustering algorithms are:
How does Hierarchical Clustering Work?
Hierarchical clustering algorithm works by starting with 1 cluster per data point and merging the clusters together until the optimal clustering is met.
- Having 1 cluster for each data point
- Defining new cluster centers using the mean of X and Y coordinates
- Combining clusters centers closest to each other
- Finding new cluster centers based on the mean
- Repeating until optimal number of clusters is met
The image below represents a dendrogram that can be used to visualize hierarchical clustering. Starting with 1 cluster per data point at the bottom and merging the closest clusters at each iteration, ending up with a single cluster for the entire dataset.
Some examples of hierarchical clustering algorithms are:
How does DBSCAN Clustering Work?
DBSCAN stands for Density-Based Spatial Clustering of Applications and Noise.
DBSCAN clustering algorithm works by assuming that the clusters are regions with high-density data points separated by regions of low-density.
Some examples of DBSCAN clustering algorithms are:
How does Gaussian Mixture Clustering Models Work?
Gaussian Mixture Models, or GMMs, are probabilistic models that look at Gaussian distributions, also known as normal distributions, to cluster data points together.
By looking at a certain number of Gaussian distributions, the models assume that each distribution is a separate cluster.
Some examples of Gaussian mixture clustering algorithms are:
Interesting Work from the Community
How to Master the Popular DBSCAN Clustering Algorithm for Machine Learning by Abhishek Sharma
Polyfuzz auto-mapping + auto-grouping tests by Charly Wargnier
This concludes the introduction of clustering in machine learning. We have covered how clustering works and provided an overview of the most common clustering machine learning models.
The next step is to learn how to use Scikit-learn to train each clustering machine learning model on real data.
Sr SEO Specialist at Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.