How to Create a Simple XML Sitemap With Python (example)

XML sitemaps are useful for the discovery of URLs by Google. Some free tools let you create XML sitemaps, but usually have a limitation to 500 URLs or so. Other paid tools can help you create an extra-large sitemap. Python can help you create XML sitemaps for SEO.

In this post, I will show you how to create a sitemap.xml file using Python and split it into files with less than 50,000 URLs.

I will also Gzip the XML Sitemaps since sitemaps with 50,000 rows can be quite heavy to process and since Google has the capacity to process Gzip compressed sitemaps.


Subscribe to my Newsletter


Best Option to Build an XML Sitemap in Python

The best way to build an XML Sitemap in Python is to use the Pandas to_xml() method.

import pandas as pd

# List of URLs
urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

# Create a DataFrame from the list of URLs
df = pd.DataFrame(urls, columns=["URL"])

# Convert DataFrame to XML
xml_data = df.to_xml(root_name="urlset", row_name="url", xml_declaration=True)

# Print the output
print(xml_data)

# Save the XML data to a file
with open("sitemap.xml", "w") as file:
    file.write(xml_data)

<?xml version='1.0' encoding='utf-8'?>
<urlset>
  <url>
    <index>0</index>
    <URL>https://example.com/page1</URL>
  </url>
  <url>
    <index>1</index>
    <URL>https://example.com/page2</URL>
  </url>
  <url>
    <index>2</index>
    <URL>https://example.com/page3</URL>
  </url>
</urlset>

Thanks to Erik Heiken for bringing awareness to building a sitemap with Pandas using the pandas to_xml() functions.

Generate XML Sitemap with jinja2

The code below may not be the best solution for the job anymore.

A special thanks to Hamlet Batista for showing me how to do this

Read: Reorganizing XML Sitemaps with Python for Fun & Profit

You can download the full code in my Github Repository.

import pandas as pd
import os
import datetime 
from jinja2 import Template
import gzip

# Import List of URLs
list_of_urls = pd.read_csv('list_of_urls.csv')
list_of_urls

# Set-Up Maximum Number of URLs (recommended max 50,000)
n = 50000

# Create New Empty Row to Store the Splitted File Number
list_of_urls.loc[:,'name'] = ''

# Split the file with the maximum number of rows specified
new_df = [list_of_urls[i:i+n] for i in range(0,list_of_urls.shape[0],n)]

# For Each File Created, add a file number to a new column of the dataframe
for i,v in enumerate(new_df):
    v.loc[:,'name'] = str(v.iloc[0,1])+'_'+str(i)
    print(v)
            
# Create a Sitemap Template to Populate

sitemap_template='''<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {% for page in pages %}
    <url>
        <loc>{{page[1]|safe}}</loc>
        <lastmod>{{page[3]}}</lastmod>
        <changefreq>{{page[4]}}</changefreq>
        <priority>{{page[5]}}</priority>        
    </url>
    {% endfor %}
</urlset>'''

template = Template(sitemap_template)

# Get Today's Date to add as Lastmod
lastmod_date = datetime.datetime.now().strftime('%Y-%m-%d')

# Fill the Sitemap Template and Write File
for i in new_df:                           # For each URL in the list of URLs ...                                                          
    i.loc[:,'lastmod'] = lastmod_date      # ... add Lastmod date
    i.loc[:,'changefreq'] = 'daily'        # ... add changefreq
    i.loc[:,'priority'] = '1.0'            # ... add priority 

    # Render each row / column in the sitemap
    sitemap_output = template.render(pages = i.itertuples()) 
    
    # Create a filename for each sitemap like: sitemap_0.xml.gz, sitemap_1.xml.gz, etc.
    filename = 'sitemap' + str(i.iloc[0,1]) + '.xml.gz' 

    # Write the File to Your Working Folder
    with gzip.open(filename, 'wt') as f:   
        f.write(sitemap_output)

Other Technical SEO Guides With Python

That’s it you now have created XML Sitemaps, divided into groups of less than 50,000 URLs, using Python.

3.7/5 - (9 votes)