How to do Keyword Topic Clustering with Python SEO (Sklearn TF-IDF + AffinityPropagation) Example

In this Python SEO tutorial, we will learn how to group keywords into topic clusters using Python and the Scikit-learn library.

The Python script will go through a list of keywords stored in a text file, use TfidfVectorizer() to create a TF-IDF representation of the list of keywords and then apply the AffinityPropagation() clustering algorithm to group keywords into topic clusters.

Here is an example output of topic clustering with TF-IDF and Affinity propagation in Scikit-learn.

Join the Newsletter

    Python Keyword Clustering Example for SEO

    from sklearn.cluster import AffinityPropagation
    from sklearn.feature_extraction.text import TfidfVectorizer
    import pandas as pd
    
    # Read keywords
    with open('keywords.txt', 'r') as f:
        keywords = f.read().splitlines()
    
    # Create a Tf-idf Vector with Keywords
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(keywords)
    
    # Perform Affinity Propagation clustering
    af = AffinityPropagation().fit(X)
    cluster_centers_indices = af.cluster_centers_indices_
    labels = af.labels_
    
    # Get the number of clusters found
    n_clusters = len(cluster_centers_indices)
    
    # Create a DataFrame to store the cluster information
    cluster_data = {
        'Topic Clusters': labels, 
        'Keywords': keywords
        }
    
    # Convert cluster_data to a Pandas DataFrame
    cluster_df = pd.DataFrame(cluster_data)
    
    # Save the DataFrame to a CSV file
    cluster_df.to_csv('clustered_keywords.csv', index=False)
    cluster_df.sort_values(by='Topic Clusters')
    

    What are Topic Clusters in SEO

    Based on the definition that I have written for Search Engine Journal, topic clusters are groupings of related terms that can help you create an architecture where all articles are interlinked or on the receiving end of internal links. They can also help identify articles that may be combined together based on the similarity of the keywords they rank for.

    Why Use Affinity Propagation for Clustering Keywords?

    The upside of using Scikit-learn’s AffinityPropagation() algorithm is that you don’t have to know the number of clusters to be created beforehand. For example, in the Search Engine Journal article linked above, I had to define the number of clusters that I wanted to end up with in order to create an NMF instance out of it.

    With Affinity Propagation, you don’t need to know how many clusters that you will create out of the keywords. It will create clusters by iteration until convergence, which can be discussed in broader details in a separate article.

    5/5 - (1 vote)