Create XML Sitemap With Python
Share this post

This post is part of the complete Guide on Python for SEO

XML sitemaps are useful for the discovery of URLs by Google. Some free tools let you create XML sitemaps, but usually have a limitation to 500 URLs or so. Other paid tool can help you create an extra-large sitemap. Python can help you create XML sitemaps for SEO.

In this post, I will show you how to create a sitemap.xml file using Python and split it into files with less than 50,000 URLs.

I will also Gzip the XML Sitemaps since sitemaps with 50,000 rows can be quite heavy to process and since Google has the capacity to process Gzip compressed sitemaps.

A special thanks to Hamlet Batista for showing me how to do this

Read: Reorganizing XML Sitemaps with Python for Fun & Profit

Full Code

You can download the full code in my Github Repository.

import pandas as pd
import os
import datetime 
from jinja2 import Template
import gzip

# Import List of URLs
list_of_urls = pd.read_csv('list_of_urls.csv')
list_of_urls

# Escape Special Characters
special_char = {'&':'&',"'":''','"':'"','>':'&gt','<':'<'}
list_of_urls[0] = list_of_urls[0].replace(special_char, regex=True)

# Set-Up Maximum Number of URLs (recommended max 50,000)
n = 50000

# Create New Empty Row to Store the Splitted File Number
list_of_urls.loc[:,'name'] = ''

# Split the file with the maximum number of rows specified
new_df = [list_of_urls[i:i+n] for i in range(0,list_of_urls.shape[0],n)]

# For Each File Created, add a file number to a new column of the dataframe
for i,v in enumerate(new_df):
    v.loc[:,'name'] = str(v.iloc[0,1])+'_'+str(i)
    print(v)
            
# Create a Sitemap Template to Populate

sitemap_template='''<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {% for page in pages %}
    <url>
        <loc>{{page[1]|safe}}</loc>
        <lastmod>{{page[3]}}</lastmod>
        <changefreq>{{page[4]}}</changefreq>
        <priority>{{page[5]}}</priority>        
    </url>
    {% endfor %}
</urlset>'''

template = Template(sitemap_template)

# Get Today's Date to add as Lastmod
lastmod_date = datetime.datetime.now().strftime('%Y-%m-%d')

# Fill the Sitemap Template and Write File
for i in new_df:                           # For each URL in the list of URLs ...                                                          
    i.loc[:,'lastmod'] = lastmod_date      # ... add Lastmod date
    i.loc[:,'changefreq'] = 'daily'        # ... add changefreq
    i.loc[:,'priority'] = '1.0'            # ... add priority 

    # Render each row / column in the sitemap
    sitemap_output = template.render(pages = i.itertuples()) 
    
    # Create a filename for each sitemap like: sitemap_0.xml.gz, sitemap_1.xml.gz, etc.
    filename = 'sitemap' + str(i.iloc[0,1]) + '.xml.gz' 

    # Write the File to Your Working Folder
    with gzip.open(filename, 'wt') as f:   
        f.write(sitemap_output)

Other Technical SEO Guides With Python

That’s it you now have created XML Sitemaps, divided into groups of less than 50,000 URLs, using Python.