Get All Your Search traffic With Google Search Console API (With Code Sample)

Share this post

This post is part of the Guide on Python for SEO and the guide on the Google Search Console API

If you wonder how you can extract all your keywords from Google Search Console, you’re at the right place.

This post will show you how you can automatically extract all your search traffic and all your keywords from Google Webmaster Tool using Python and the Google Search Console API.

(Step-by-step guide with code sample)

The perfect solution for SEOs working on large websites that have a lot of pages and a lot of keywords.

That’s right, you will learn how you can extract more than 25,000 keywords and automatically export the data to a CSV file.

With this script, I managed to extract over 500K rows of data each month from the GSC API.

If you want to look at a more complex, but more powerful solution to extract all your Google Search Console data, you can check this introduction to GSC API video tutorial.

To learn more about Python, view my complete guide on Python for SEO.

Limits of Google Search Console

GSC is a free tool offered by Google, so it is normal that they don’t show you all the data to everyone, as this would be a very expensive process. Instead, they offer you an API.

With GSC, you can’t get a report of top-performing keywords by page, or by country, or by the device.

The Google Search Console performance data is also limiting you to 1000 rows of data, which can be annoying.

The GSC API lets you create keywords by page reports and also lets you query all the pages and all the keywords possible to extract from GSC.

Extract All Your Keywords Data From Google Search Console

For those not used to working with APIs, it can be a hurdle to start, but you should, because Google APIs are awesome for SEOs.

Fear not, I will walk you through to help you get started.

You can learn everything from this post, but if you feel that you need an easier starter guide, just read my post “How to Use Google Search Console API With Python“.

Here are the steps to extract all your search traffic data from Google Search Console.

  1. Get Your Google Search Console API Key
  2. Import Libraries
  3. Log in to the API
  4. Make Your API Call

Step 1: Get Your Google Search Console API Key

First, you need to get an API key to connect to the Google Search Console API.

What is an API Key?

The API key is kind of the username and password that you need to give to access the API.

How do you get the GSC API Keys?

Follow this guide to help you get the detailed steps to get your Google Search Console API key.

Otherwise, here are the simplified steps to get your Google Search Console API keys.

  1. Go to Google’s developers console, and sign-in;
  2. Go to “Dashboard” and click “Enable APIs and Services” ;
  3. Search for “Google Search Console API” and enable the API;
  4. Go to the “credential” tab, click on “create credential” and select “OAuth Client ID”;
  5. Click on “configure the consent screen ” and give a name to your product;
  6. Choose “Other” as the application type and click create;
  7. Copy the client ID and client Secret or go-on and save to download the JSON file.

Step 2: Import Libraries

Here are the libraries that we will need to run this code.

import pandas as pd
import datetime
from datetime import date, timedelta
import httplib2
from googleapiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from collections import defaultdict
from dateutil import relativedelta
import argparse
from oauth2client import client
from oauth2client import file
from oauth2client import tools
import re
import os
from urllib.parse import urlparse

Step 3: Create Directory for Your Project

The goal here is to use the URL that you have added to create a directory for your project to store the CSV Data. These two functions will extract the domain name and create the project using the domain name.

# Get Domain Name to Create a Project
def get_domain_name(start_url):
    domain_name = '{uri.netloc}'.format(uri=urlparse(start_url))  # Get Domain Name To Name Project
    domain_name = domain_name.replace('.','_')
    return domain_name


# Create a project Directory for this website
def create_project(directory):
    if not os.path.exists(directory):
        print('Create project: '+ directory)
        os.makedirs(directory)

Step 4. Log in to the API

Now, to log in the API, you will need to download your JSON API Key from step 1 and save it as client_secrets.jsonin your working folder.

def authorize_creds(creds):
    # Variable parameter that controls the set of resources that the access token permits.
    SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly'] 

    # Path to client_secrets.json file
    CLIENT_SECRETS_PATH = creds

    # Create a parser to be able to open browser for Authorization
    parser = argparse.ArgumentParser(
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[tools.argparser])
    flags = parser.parse_args([])

    flow = client.flow_from_clientsecrets(
        CLIENT_SECRETS_PATH, scope = SCOPES,
        message = tools.message_if_missing(CLIENT_SECRETS_PATH))

    # Prepare credentials and authorize HTTP
    # If they exist, get them from the storage object
    # credentials will get written back to a file.
    storage = file.Storage('authorizedcreds.dat')
    credentials = storage.get()

    # If authenticated credentials don't exist, open Browser to authenticate
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, flags)
    http = credentials.authorize(http=httplib2.Http())
    webmasters_service = build('webmasters', 'v3', http=http)
    return webmasters_service

Step 5: Execute Request

We will create the python function to execute the request to the Google Search Console API.

def execute_request(service, property_uri, request): return service.searchanalytics().query(siteUrl=property_uri, body=request).execute()

Step 6: Create Function to Process the CSV Files

Extracting Google Search Console (GSC) data by date can be a long task to run if you have a lot of pages. What we will want to do is to save the data to a CSV file whenever we have finished extracting data from a specific date.

We will also want to check if the dates have been processed, so we don’t extract the same information twice.

# Create function to write to CSV
def write_to_csv(data,filename):
    if not os.path.isfile(filename):
        data.to_csv(filename)
    else: # else it exists so append without writing the header
        data.to_csv(filename, mode='a', header=False)

# Read CSV if it exists to find dates that have already been processed.
def get_dates_from_csv(path):
    if os.path.isfile(path):
        data = pd.read_csv(path)
        data = pd.Series(data['date'].unique())
        return data
    else:
        pass

Step 6: Create Function to Extract All the Data

# Create function to extract all the data
def extract_data(site,creds,num_days,output):
    domain_name = get_domain_name(site)             # Get Domain From URL
    create_project(domain_name)                     # Create a new project folder
    full_path = domain_name + '/' + output          # Get Path to add all files
    csv_dt = get_dates_from_csv(full_path)          # Read existing CSV
    webmasters_service = authorize_creds(creds)     # Get credentials to log in the api

    # Set up Dates
    end_date = datetime.date.today() -\
         relativedelta.relativedelta(days=3)        # Start 3 days in the past, since GSC don't show latest data.
    start_date = end_date -\
         relativedelta.relativedelta(days=num_days) # Get end date minus the number of days to go back in time.
    delta = datetime.timedelta(days=1)              # This will let us loop one day at the time
    scDict = defaultdict(list)                      # initialize empty Dict to store data

    while start_date <= end_date:                   # Loop through all dates until start_date is equal to end_date.
        
        # If a GSC csv file exists from previous extraction
        # and dates in the file match to dates we are extracting...
        if csv_dt is not None and \
            csv_dt.str.contains(dt_to_str(start_date)).any(): 
            print('Existing Date: %s' % start_date) #... Print the date
            start_date += delta                     #... and increment without extraction  
        else:                                       # If the file doesn't exist, or date don't match...
            # ... Print and start the extraction
            print('Start date at beginning: %s' % start_date)

            maxRows = 25000 # Maximum 25K per call 
            numRows = 0     # Start at Row Zero
            status = ''     # Initialize status of extraction


            while (status != 'Finished') : # As long as today's data have not been fully extracted.
                # Extract this information from GSC
                request = {
                    'startDate': dt_to_str(start_date),     # Get today's date (while loop)
                    'endDate': dt_to_str(start_date),       # Get today's date (while loop)
                    'dimensions': ['date','page','query'],  # Extract This information
                    'rowLimit': maxRows,                    # Set number of rows to extract at once (max 25k)
                    'startRow': numRows                     # Start at row 0, then row 25k, then row 50k... until with all.
                }

                response = execute_request(webmasters_service, site, request)
                
                #Process the response
                try:
                    for row in response['rows']:
                        scDict['date'].append(row['keys'][0] or 0)    
                        scDict['page'].append(row['keys'][1] or 0)
                        scDict['query'].append(row['keys'][2] or 0)
                        scDict['clicks'].append(row['clicks'] or 0)
                        scDict['ctr'].append(row['ctr'] or 0)
                        scDict['impressions'].append(row['impressions'] or 0)
                        scDict['position'].append(row['position'] or 0)
                    print('successful at %i' % numRows)

                except:
                    print('error occurred at %i' % numRows)

                # Add response to dataframe 
                df = pd.DataFrame(data = scDict)
                df['clicks'] = df['clicks'].astype('int')
                df['ctr'] = df['ctr']*100
                df['impressions'] = df['impressions'].astype('int')
                df['position'] = df['position'].round(2)

                # Increment the 'start_row'
                print('Numrows at the start of loop: %i' % numRows)
                try: 
                    numRows = numRows + len(response['rows'])
                except:
                    status = 'Finished'                 # If no response left, change status
                print('Numrows at the end of loop: %i' % numRows)
                if numRows % maxRows != 0:              # If numRows not divisible by 25k...
                    status = 'Finished'                 # change status, you have covered all lines.                
        
            start_date += delta                         # Increment start_date to continue the loop
            print('Start date at end: %s' % start_date) 
            write_to_csv(df,full_path)                  # Write today's data to CSV file.
        df = pd.read_csv(full_path)
    return df

Step 7: Execute Request

site = 'https://www.yoursite.com'    # Property to extract
num_days = 5                         # Number of Days, Months to Extract
creds = 'client_secrets.json'        # Credential file from GSC
output = 'gsc_data.csv'
df = extract_data(site,creds,num_days,output)
df.sort_values('clicks',ascending=False)

Full Code

import pandas as pd
import datetime
from datetime import date, timedelta
import httplib2
from googleapiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
from collections import defaultdict
from dateutil import relativedelta
import argparse
from oauth2client import client
from oauth2client import file
from oauth2client import tools
import re
import os
from urllib.parse import urlparse

site = 'https://www.yoursite.com'    # Property to extract
num_days = 5                    # Number of Days, Months to Extract
creds = 'client_secrets.json'   # Credential file from GSC
output = 'gsc_data.csv'

# Get Domain Name to Create a Project
def get_domain_name(start_url):
    domain_name = '{uri.netloc}'.format(uri=urlparse(start_url))  # Get Domain Name To Name Project
    domain_name = domain_name.replace('.','_')
    return domain_name


# Create a project Directory for this website
def create_project(directory):
    if not os.path.exists(directory):
        print('Create project: '+ directory)
        os.makedirs(directory)

def authorize_creds(creds):
    # Variable parameter that controls the set of resources that the access token permits.
    SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly'] 

    # Path to client_secrets.json file
    CLIENT_SECRETS_PATH = creds

    # Create a parser to be able to open browser for Authorization
    parser = argparse.ArgumentParser(
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[tools.argparser])
    flags = parser.parse_args([])

    flow = client.flow_from_clientsecrets(
        CLIENT_SECRETS_PATH, scope = SCOPES,
        message = tools.message_if_missing(CLIENT_SECRETS_PATH))

    # Prepare credentials and authorize HTTP
    # If they exist, get them from the storage object
    # credentials will get written back to a file.
    storage = file.Storage('authorizedcreds.dat')
    credentials = storage.get()

    # If authenticated credentials don't exist, open Browser to authenticate
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, flags)
    http = credentials.authorize(http=httplib2.Http())
    webmasters_service = build('webmasters', 'v3', http=http)
    return webmasters_service

# Create Function to execute your API Request
def execute_request(service, property_uri, request):
    return service.searchanalytics().query(siteUrl=property_uri, body=request).execute()

# Create function to write to CSV
def write_to_csv(data,filename):
    if not os.path.isfile(filename):
        data.to_csv(filename)
    else: # else it exists so append without writing the header
        data.to_csv(filename, mode='a', header=False)

# Read CSV if it exists to find dates that have already been processed.
def get_dates_from_csv(path):
    if os.path.isfile(path):
        data = pd.read_csv(path)
        data = pd.Series(data['date'].unique())
        return data
    else:
        pass

   
# Create function to extract all the data
def extract_data(site,creds,num_days,output):
    domain_name = get_domain_name(site)
    create_project(domain_name)
    full_path = domain_name + '/' + output
    current_dates = get_dates_from_csv(full_path)

    webmasters_service = authorize_creds(creds)

    # Set up Dates
    end_date = datetime.date.today() - relativedelta.relativedelta(days=3)
    start_date = end_date - relativedelta.relativedelta(days=num_days)
    delta = datetime.timedelta(days=1) # This will let us loop one day at the time
    scDict = defaultdict(list)

    while start_date <= end_date:
        if current_dates is not None and current_dates.str.contains(datetime.datetime.strftime(start_date,'%Y-%m-%d')).any():
            print('Existing Date: %s' % start_date)
            start_date += delta     
        else:
            print('Start date at beginning: %s' % start_date)

            maxRows = 25000 # Maximum 25K per call 
            numRows = 0     # Start at Row Zero
            status = ''     # Initialize status of extraction


            while (status != 'Finished') : # Test with i < 10 just to see how long the task will take to process.
                request = {
                    'startDate': datetime.datetime.strftime(start_date,'%Y-%m-%d'),
                    'endDate': datetime.datetime.strftime(start_date,'%Y-%m-%d'),
                    'dimensions': ['date','page','query'],
                    'rowLimit': maxRows, 
                    'startRow': numRows
                }

                response = execute_request(webmasters_service, site, request)

                try:
                #Process the response
                    for row in response['rows']:
                        scDict['date'].append(row['keys'][0] or 0)    
                        scDict['page'].append(row['keys'][1] or 0)
                        scDict['query'].append(row['keys'][2] or 0)
                        scDict['clicks'].append(row['clicks'] or 0)
                        scDict['ctr'].append(row['ctr'] or 0)
                        scDict['impressions'].append(row['impressions'] or 0)
                        scDict['position'].append(row['position'] or 0)
                    print('successful at %i' % numRows)

                except:
                    print('error occurred at %i' % numRows)

                #Add response to dataframe 
                df = pd.DataFrame(data = scDict)
                df['clicks'] = df['clicks'].astype('int')
                df['ctr'] = df['ctr']*100
                df['impressions'] = df['impressions'].astype('int')
                df['position'] = df['position'].round(2)

                print('Numrows at the start of loop: %i' % numRows)
                try: 
                    numRows = numRows + len(response['rows'])
                except:
                    status = 'Finished'
                print('Numrows at the end of loop: %i' % numRows)
                if numRows % maxRows != 0:
                    status = 'Finished'
        
            start_date += delta  
            print('Start date at end: %s' % start_date) 
            write_to_csv(df,full_path)
    return df

df = extract_data(site,creds,num_days,output)
df.sort_values('clicks',ascending=False)

You can run the task daily using Windows Task Scheduler and save the data to a CSV, or a MySQL database.

Automate Your Python Script

To automate the Google Search Console API queries, schedule your python script on Windows task scheduler, or automate python script using CRON on Mac.


Google Search Console FAQs

📙What is Google Search Console?

Google Search Console (GSC), formerly Google Webmaster Tool (GWT), is the platform that Google offers to help monitor a website’s search performance. It provides tools that let webmasters test whether a page is indexed, how it is rendered and other traffic metrics like clicks, impressions, click-through-rate and position in Google.

What Is the Google Search Console API?

The Google Search Console API is a free service that lets you extract data from the search performance report. You can then store it and use it however it pleases you.

🩸How to get Google Search Console API Key?

You can connect to the API using API keys from the google developer console. Follow this guide to help you get the detailed steps to get your Google Search Console API key.