This post is part of the complete Guide on Python for SEO
With this Python script, I will show you how to extract Google Search Console data from a list of URLs. This is the perfect solution to get GSC data from URLs you have crawled.
The script is simple enough, so I will not provide a lot of details around it. If you want to learn more about how to use the Google Search Console API, read my complete guide on the Google Search Console API. I have covered extensively all the details around it.
The script uses multiple libraries, such as Pandas and HTTPlib2.
import pandas as pd
import datetime
import httplib2
from apiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from collections import defaultdict
from dateutil import relativedelta
import argparse
from oauth2client import client
from oauth2client import file
from oauth2client import tools
import re
import os
list_of_urls = pd.read_csv('list_of_urls.csv',header=None)
site = 'https://www.example.com/'
SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly'] # Variable parameter that controls the set of resources that the access token permits.
CLIENT_SECRETS_PATH = '/client_secrets.json' # Path to client_secrets.json file.
parser = argparse.ArgumentParser(
formatter_class = argparse.RawDescriptionHelpFormatter,
parents = [tools.argparser])
flags = parser.parse_args([])
flow = client.flow_from_clientsecrets(
CLIENT_SECRETS_PATH, scope = SCOPES,
message = tools.message_if_missing(CLIENT_SECRETS_PATH))
storage = file.Storage('/authorizedcreds.dat')
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = tools.run_flow(flow, storage, flags)
http = credentials.authorize(http=httplib2.Http())
webmasters_service = build('searchconsole', 'v1', http=http)
start_date = datetime.date.today()-relativedelta.relativedelta(month=3)
end_date = datetime.date.today()-relativedelta.relativedelta(days=3)
def execute_request(service, property_uri, request):
return service.searchanalytics().query(siteUrl=property_uri, body=request).execute()
scDict = defaultdict(list)
for url in list_of_urls:
request = {
'startDate': datetime.datetime.strftime(start_date,"%Y-%m-%d"),
'endDate': datetime.datetime.strftime(end_date,'%Y-%m-%d'),
'dimensions': 'page', #country, device, page, query, searchAppearance
'dimensionFilterGroups': [{
'filters': [{
'dimension': 'page',
'operator': 'equals', #contains, equals, notEquals, notContains
'expression': url
}]
}]
}
response = execute_request(webmasters_service, site, request)
scDict['page'].append(url)
for row in response['rows']:
scDict['clicks'].append(row['clicks'] or 0)
scDict['impressions'].append(row['impressions'] or 0)
df = pd.DataFrame(data = scDict)
df['clicks'] = df['clicks'].astype('int')
df['impressions'] = df['impressions'].astype('int')
df.sort_values('clicks',inplace=True,ascending=False)
df
Automate Your Python Script
To automate the Google Search Console API queries, schedule your python script on Windows task scheduler, or automate python script using CRON on Mac.
Other Posts to Help You Use The GSC API
- How To Query the Google Search Console API?
- How to Connect to a Google API
- How to Connect to Google Search Console API using Python
- Google Search Console Data From a List of URLs
- Get All Your Search traffic With Google Search Console API (more than 50,000 keywords)
- Find Keyword Cannibalization Using Google Search Console and Python
- Backup Google Search Console Data Into MySQL With Python
- How to use Google Search Console API with R

SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.