Google Analytics Reporting API v4 using Python [2020 update]

Share this post

This post is part of the complete Guide on Python for SEO

This is the best place to learn how to set up Google Analytics Reporting API v4 via Python, using Spyder IDE.

(simple step-by-step guide)

Of course, you will need some basics in Python to fully understand, but I will guide you through each step carefully.

In case you feel like you lack what you need to complete this guide I made a complete series to help you learn Python for SEO.

To make an API call to Google Analytics Reporting API using Python, follow these steps.

  1. Setup Python

  2. Create a New Project and Service Account

  3. Install Google API Client in Python

  4. Connect Python to Google Analytics Reporting API

  5. Make the API Call

  6. Parse The Report Data

  7. Create a DataFrame With Your Data

1. Setup Python

I will not go extensively in how to get set up with Python, since I already made a few guides on the subject.

Make sure that you have followed those preliminary steps before you get started using Google Analytics Reporting API.

  1. Install Python
  2. Learn Basics in Python (extensive guide)
  3. Run Python With Spyder IDE

2. Create a New Project and Service Account

It is now time to create a new project to help you use the Google Analytics Reporting API.

What you will need is to create a “service account key” in Google Developers Console and download the JSON Key File.

I will not show you extensively how to get your JSON credentials file here, but I understand that this part can get confusing.

This is why I have made a simple beginner guide to help you get connected to Google Analytics API in Google developer console.

Otherwise here are the simple steps:

  1. Sign-in to Google’s developers console
  2. Create a Project in Google API
  3. Activate the Google Analytics Reporting API
  4. On the credentials, page click create credentials and select Service account key.
  5. Create a JSON key and download it to your computer.
  6. Log in to Google Analytics
  7. In the Admin, select the view that you want to extract data from and go to User Management.
  8. Open the service account JSON key file that you just downloaded in a text editor.
  9. Copy the client_email and give this email read permissions.

3. Install Google API Client in Python

Go to Anaconda Prompt, then add this command

conda install -c conda-forge google-api-python-client

If you are using Pip, just go to your command line and add this command.

pip install --upgrade google-api-python-client

4. Connect Python to Google Analytics Reporting API

Now it is time to connect Python to the Google Analytics API using Spyder IDE.

To connect to this API, follow those steps.

  1. Create a Project in Spyder IDE
  2. Rename your JSON key to client_secrets.json
  3. Save the JSON file to your working directory
  4. Load this bit of code.
#Load Libraries
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
import httplib2

#Create service credentials
#Rename your JSON key to client_secrets.json and save it to your working folder
credentials = ServiceAccountCredentials.from_json_keyfile_name('client_secrets.json', ['https://www.googleapis.com/auth/analytics.readonly'])
 
#Create a service object
http = credentials.authorize(httplib2.Http())
service = build('analytics', 'v4', http=http, discoveryServiceUrl=('https://analyticsreporting.googleapis.com/$discovery/rest'))

5. Make Your First API Call

To make your first API call you will need to get your view ID from Google Analytics. (remember to use the same view that you authorized your service account).

Go to Google Analytics > Admin > View > View Settings and copy the View ID.

Get View ID from Google Analytics
Get View ID from Google Analytics

Once you have your view ID, you are ready to make your first API call.

There are many queries you can make with the batchGet reports. Just play around with Google reporting API tester.

In my case, I needed to create a report of the 100 top pages by sessions for the last 30 days. I also needed it sorted and filtered by pagePath containing “product” in the URL, but also exclude the pages generated by google translate.

For advanced filtering, see Google’s documentation.

    body={
        'reportRequests': [
            {
                'viewId': 'XXXXXXXX', #Add View ID from GA
                'dateRanges': [{'startDate': '30daysAgo', 'endDate': 'today'}],
                'metrics': [{'expression': 'ga:sessions'}], 
                'dimensions': [{"name": "ga:pagePath"}], #Get Pages
                "filtersExpression":"ga:pagePath=~products;ga:[email protected]/translate", #Filter by condition "containing products"
                'orderBys': [{"fieldName": "ga:sessions", "sortOrder": "DESCENDING"}], 
                'pageSize': 100
            }]
    }
).execute()

We now have made our first call.

Quick Tip!

You could also use template literals (like you would do in JavaScript) to add dynamic filters.

To do this, use the f and the curly brackets {} like this f"ga:pagePath={regex}".

regex='~^/see-company/.*'

response = service.reports().batchGet(
    body={
        'reportRequests': [
            {
                'viewId': '30029063',
                'dateRanges': [{'startDate': '90daysAgo', 'endDate': 'today'}],
                'metrics': [{'expression': 'ga:sessions'}],
                'dimensions': [{"name": "ga:pagePath"}],
                "filtersExpression":f"ga:pagePath={regex}", 
                'orderBys': [{"fieldName": "ga:sessions", "sortOrder": "DESCENDING"}],
                'pageSize': 10000
            }]
    }
).execute()

As you can see, you don’t have any useable reports yet. This is because you need to parse your report data.

6. Parsing The Report Data

We will now extract data in order to use it.

First, we create 2 empty lists to store data. Then, we will read the response and extract the data that we want.

#create two empty lists that will hold our dimentions and sessions data
dim = []
val = []
 
#Extract Data
for report in response.get('reports', []):
 
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])
 
    for row in rows:
 
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])
 
        for header, dimension in zip(dimensionHeaders, dimensions):
            dim.append(dimension)
 
        for i, values in enumerate(dateRangeValues):
            for metricHeader, value in zip(metricHeaders, values.get('values')):
                val.append(int(value))

7. Create a DataFrame With Your Data

df = pd.DataFrame() 
df["Sessions"]=val
df["pagePath"]=dim
df=df[["pagePath","Sessions"]]

7. Export to CSV

Having a dataframe you can do pretty much whatever you want with your data.

Just to make sure that you are good to go, let’s export the file to CSV.

df.to_csv("page_by_session.csv")

Full Reusable Code

#Load Libraries
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
import httplib2
import pandas as pd

#Create service credentials
#Rename your JSON key to client_secrets.json and save it to your working folder
credentials = ServiceAccountCredentials.from_json_keyfile_name('client_secrets.json', ['https://www.googleapis.com/auth/analytics.readonly'])
 
#Create a service object
http = credentials.authorize(httplib2.Http())
service = build('analytics', 'v4', http=http, discoveryServiceUrl=('https://analyticsreporting.googleapis.com/$discovery/rest'))

    body={
        'reportRequests': [
            {
                'viewId': 'XXXXXXXX', #Add View ID from GA
                'dateRanges': [{'startDate': '30daysAgo', 'endDate': 'today'}],
                'metrics': [{'expression': 'ga:sessions'}], 
                'dimensions': [{"name": "ga:pagePath"}], #Get Pages
                "filtersExpression":"ga:pagePath=~products;ga:[email protected]/translate", #Filter by condition "containing products"
                'orderBys': [{"fieldName": "ga:sessions", "sortOrder": "DESCENDING"}], 
                'pageSize': 100
            }]
    }
).execute()

#create two empty lists that will hold our dimentions and sessions data
dim = []
val = []
 
#Extract Data
for report in response.get('reports', []):
 
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])
 
    for row in rows:
 
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])
 
        for header, dimension in zip(dimensionHeaders, dimensions):
            dim.append(dimension)
 
        for i, values in enumerate(dateRangeValues):
            for metricHeader, value in zip(metricHeaders, values.get('values')):
                val.append(int(value))

#Sort Data
val.reverse()
dim.reverse()

df = pd.DataFrame() 
df["Sessions"]=val
df["pagePath"]=dim
df=df[["pagePath","Sessions"]]
df

#Export to CSV
df.to_csv("page_by_session.csv")

Conclusion

This is the end of the tutorial.

If you would like to create advanced visualizations from your data, I would suggest that you read Nicolas Lierman‘s post that inspired this post and that will show you to create matplotlib’s vizualisations from your Google Analytics Data.

To replicate something like this using Jupyter, follow this guide by Woratana Perth.

I’d love to see more extractions that you were able to make using Google Analytics Reporting API with Python. Let me know in the comments.