Google Search Console Data From a List of URLs

Share this post

This post is part of the complete Guide on Python for SEO

With this Python script, I will show you how to extract Google Search Console data from a list of URLs. This is the perfect solution to get GSC data from URLs you have crawled.

The script is simple enough, so I will not provide a lot of details around it. If you want to learn more about how to use the Google Search Console API, read my complete guide on the Google Search Console API. I have covered extensively all the details around it.

import pandas as pd
import datetime
import httplib2
from apiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from collections import defaultdict
from dateutil import relativedelta
import argparse
from oauth2client import client
from oauth2client import file
from oauth2client import tools
import re
import os

list_of_urls = pd.read_csv('list_of_urls.csv',header=None)

site = 'https://www.example.com/'

SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly'] # Variable parameter that controls the set of resources that the access token permits.

CLIENT_SECRETS_PATH = '/client_secrets.json' # Path to client_secrets.json file.
 
parser = argparse.ArgumentParser(
    formatter_class = argparse.RawDescriptionHelpFormatter,
    parents = [tools.argparser])
flags = parser.parse_args([])

flow = client.flow_from_clientsecrets(
    CLIENT_SECRETS_PATH, scope = SCOPES,
    message = tools.message_if_missing(CLIENT_SECRETS_PATH))

storage = file.Storage('/authorizedcreds.dat')
credentials = storage.get()
 

if credentials is None or credentials.invalid:
    credentials = tools.run_flow(flow, storage, flags)
http = credentials.authorize(http=httplib2.Http())

webmasters_service = build('webmasters', 'v3', http=http)

start_date = datetime.date.today()-relativedelta.relativedelta(month=3)
end_date = datetime.date.today()-relativedelta.relativedelta(days=3)

def execute_request(service, property_uri, request):
    return service.searchanalytics().query(siteUrl=property_uri, body=request).execute()


scDict = defaultdict(list)
for url in list_of_urls:
    request = {
                'startDate': datetime.datetime.strftime(start_date,"%Y-%m-%d"),
                'endDate': datetime.datetime.strftime(end_date,'%Y-%m-%d'),
                'dimensions': 'page',  #country, device, page, query, searchAppearance
                'dimensionFilterGroups': [{
                  'filters': [{
                      'dimension': 'page',              
                      'operator': 'equals',           #contains, equals, notEquals, notContains
                      'expression': url
                  }]
                 }]
        }

    response = execute_request(webmasters_service, site, request)

    scDict['page'].append(url)

    for row in response['rows']:
        scDict['clicks'].append(row['clicks'] or 0)
        scDict['impressions'].append(row['impressions'] or 0)

df = pd.DataFrame(data = scDict)
df['clicks'] = df['clicks'].astype('int')
df['impressions'] = df['impressions'].astype('int')
df.sort_values('clicks',inplace=True,ascending=False)
df

Automate Your Python Script

To automate the Google Search Console API queries, schedule your python script on Windows task scheduler, or automate python script using CRON on Mac.

Other Posts to Help You Use The GSC API