Recrawl URLs Extracted with Screaming Frog (using Python)

This post is part of the complete Guide on Python for SEO

This tutorial is for you that wants to crawl a website with Screaming Frog and extract more URLs from those pages to be recrawled.

By using custom extraction, you will end up with a list of new URLs placed into columns not ready to be sent the crawler.

What you need to do is to convert the extraction columns into a database that you can send back to Screaming Frog.


Subscribe to my Newsletter


It should be a built-in function, right?

But no.

It is simple to copy them manually if you have crawled 10 pages, but not if you have crawled a hundred thousand pages.

However, it is really simple to solve this using Python.

If you don’t know how to use Python, I have an entire Guide dedicated to Python for SEO.

Convert Your Extracted URLs to a Database Using Pandas

First, you need to export your crawl.

Then, we will convert your extracted URLs to a database format using Pandas.

import pandas as pd
crawl = pd.read_excel(r'C:\Users\j-c.chouinard\Python\Screaming Frog\burnabycrawl.xlsx')

#Transpose DataFrame
crawl_transposed = crawl.transpose()

#Remove duplicates in all rows
for i in range(len(crawl_transposed)):
    crawl_transposed.iloc[:,i]=crawl_transposed.iloc[:,i].drop_duplicates()

#Bring back to the original row/column order
crawl_dedup=crawl_transposed.transpose()

#Remove Statuses Columns
crawl_drop=crawl_tdedup.drop(crawl_tdedup.columns[1:3],axis="columns")

#Send it to a database format
crawl_db=pd.melt(crawl_drop, id_vars='Address', value_vars=crawl_drop.iloc[:,2:], var_name='extractedUrls', value_name='Extraction').dropna()

#Write to excel
crawled_urls.to_excel("urls_to_recrawl.xlsx",index=False,header=False)

Other Technical SEO Guides With Python

You now have a file that you can use to recrawl the URLs extracted from your previous crawl.

5/5 - (1 vote)