How to do Keyword Topic Clustering with Python SEO (Sklearn TF-IDF + AffinityPropagation) Example

In this Python SEO tutorial, we will learn how to group keywords into topic clusters using Python and the Scikit-learn library.

The Python script will go through a list of keywords stored in a text file, use TfidfVectorizer() to create a TF-IDF representation of the list of keywords and then apply the AffinityPropagation() clustering algorithm to group keywords into topic clusters.

Here is an example output of topic clustering with TF-IDF and Affinity propagation in Scikit-learn.


Subscribe to my Newsletter


Python Keyword Clustering Example for SEO

from sklearn.cluster import AffinityPropagation
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

# Read keywords
with open('keywords.txt', 'r') as f:
    keywords = f.read().splitlines()

# Create a Tf-idf Vector with Keywords
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(keywords)

# Perform Affinity Propagation clustering
af = AffinityPropagation().fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_

# Get the number of clusters found
n_clusters = len(cluster_centers_indices)

# Create a DataFrame to store the cluster information
cluster_data = {
    'Topic Clusters': labels, 
    'Keywords': keywords
    }

# Convert cluster_data to a Pandas DataFrame
cluster_df = pd.DataFrame(cluster_data)

# Save the DataFrame to a CSV file
cluster_df.to_csv('clustered_keywords.csv', index=False)
cluster_df.sort_values(by='Topic Clusters')

What are Topic Clusters in SEO

Based on the definition that I have written for Search Engine Journal, topic clusters are groupings of related terms that can help you create an architecture where all articles are interlinked or on the receiving end of internal links. They can also help identify articles that may be combined together based on the similarity of the keywords they rank for.

Why Use Affinity Propagation for Clustering Keywords?

The upside of using Scikit-learn’s AffinityPropagation() algorithm is that you don’t have to know the number of clusters to be created beforehand. For example, in the Search Engine Journal article linked above, I had to define the number of clusters that I wanted to end up with in order to create an NMF instance out of it.

With Affinity Propagation, you don’t need to know how many clusters that you will create out of the keywords. It will create clusters by iteration until convergence, which can be discussed in broader details in a separate article.

Enjoyed This Post?