Python Libraries for SEO – Beginner Guide

Share this post

This post is part of the complete Guide on Python for SEO

How to use the basic Python packages like pandas, numpy, matplotlib, seaborn and requests for SEO?

In this post, I will show you how to use each package to help you work with Python.

Getting Started With Python Packages

If you haven’t installed Python on your computer yet, make sure that you read this easy step-by-step guide to install Python using Anaconda.

Also, make sure that you know how to use Spyder IDE, Jupyter notebook or the IPython console before you read this guide. You can find all this information by reading my previous tutorial to help you learn Python or start learning Python for SEO from scratch.

What Packages Will We Cover?

In this guide, we will cover the most useful packages that are used in Data Science.

If you followed the advice outlined in the preface and installed Python using Anaconda,
you already have all these packages installed and ready to go.


Install and use the Packages

Before you can get started using NumPy, Pandas, and Matplotlib, you need to install the packages. If you have installed Python using Anaconda, you can skip the first step.

Step #1: Install the packages (optional)

Go in command prompt and type each line one-by-one.

pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install requests

Step #2 Load the libraries

To load the NumPy library, just use the import function.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



NumPy, short for Numerical Python, provides efficient storage and manipulation of numerical arrays that you can use for advanced calculations instead of regular Python lists.

Import Numpy

import numpy as np

NumPy Arrays

The arrays, like the lists, are used to store multiple values in one single variable. The main difference is that you can perform calculations over entire arrays.

A regular list doesn’t allow you to do this.

Let’s say we want to analyze three days’ traffic data against two groups.

##[8, 9, 13, 2, 6, 5]

What we would have wanted instead of a pasted together list is the added array equalling [10,15,18]. Hence, the use of NumPy.

NumPy Arrays Calculations

First, let’s import NumPy.

import numpy as np

Then let’s calculate our two groups and three days’ worth of data, but this time using NumPy.

import numpy as np
## array([10, 15, 18])

The np.array() will perform its calculations element-wise.

It will add the first element of the first list to the first element of the second list, and so on.

In our case, it will perform this way: array([8+2 , 9+6 , 13+5]).

Remember that unlike Python lists, NumPy needs arrays that all contain the same type. If types do not match, the NumPy array will contain a single type.

np.array(["my domain authority is", 89])
## array(['my domain authority is', '89'], dtype='<U22')

Here, both the string and the integer were converted to strings.

You can also select some data using arrays.

# Indexing a single element
## 18
ttl_seo > 10
## array([False,  True,  True])

Create Arrays Using NumPy

You can also create arrays from scratch using NumPy functions.

You could use np.arange() to build an ordered list of values up to the value you select; np.zeros() to create arrays filled with zeros; np.reshape() to build multi-dimensional arrays; and so on.

Here are a few useful functions.

Build an ordered list


Build an ordered list in a range, jumping from a number of values

## array([6,  9, 12, 15, 18, 21, 24, 27])

Create a list filled with zeros

np.zeros(6, dtype=int)
## array([0, 0, 0, 0, 0, 0])

Fill a matrix with a single value

##array([[4.15, 4.15, 4.15, 4.15, 4.15],
##       [4.15, 4.15, 4.15, 4.15, 4.15],
##       [4.15, 4.15, 4.15, 4.15, 4.15],
##       [4.15, 4.15, 4.15, 4.15, 4.15],
##       [4.15, 4.15, 4.15, 4.15, 4.15]])

Create a 2D array

## array([[0, 1, 2, 3],
##       [4, 5, 6, 7]])

Create a 3D array

## array([[[0, 1],[2, 3]],
##       [[4, 5],[6, 7]]])

Create a 2D array with random values between 0 and 1

## array([[0.48556036, 0.94031317],
##         [0.01495329, 0.79882602]])

Copy an array

copy = np.arange(5).copy()

Arrays Slicing

As we have seen with lists in our beginner Guide to Python, it is possible to access subarrays with the slice notation, using the colon (:).

Create an array

x = np.arange(10)
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Select the first five elements

#array([0, 1, 2, 3, 4])

Select elements after index 5

#array([5, 6, 7, 8, 9])

Select the range of values

#array([2, 3, 4])

Select a range of elements jumping two steps

#array([0, 2, 4, 6, 8])

Select every element starting at index 1, jumping two steps

#array([1, 3, 5, 7, 9])

Multi-Dimensional NumPy Arrays

Using NumPy, you can easily create multi-dimensional arrays.

Let’s create a 2D array.

x = np.arange(8).reshape(2,4) 
## array([[0, 1, 2, 3],
##       [4, 5, 6, 7]])

To find what shape our “x” array has, use the shape function.

## (2, 4)

Let’s find other information about our array.

#Find the dimension

#Find the size of the array

#Find the Type

Let’s change one value to a different data type.

#Look first position of the array

#Change it to a float

Numpy has converted the float number to an integer.

Numpy will always use the same data type and convert it automatically.

2D Arrays Subsetting

There are multiple ways to subset 2-dimensional arrays.

Create a Matrix

x = np.arange(8).reshape(2,4)
## array([[0, 1, 2, 3],
##       [4, 5, 6, 7]])

Select the 1st row

## array([0, 1, 2, 3]) 

Same as previous notation

## array([0, 1, 2, 3]) 

Select 1st row, 4th column

## 3

Select 1st row, 4th column

## 3

Select fourth column

## array([3, 7])

Select the intersection between the 2nd and 3rd column and both rows

## array([[1, 2],
##       [5, 6]])

Randomize With NumPy

A really useful function of NumPy is to generate random numbers.

You can do this using rand() function.

random_data = np.random.rand(1000)
plot = plt.scatter(range(1000),random_data)
random data

Use NumPy to Visualize The Gaussian Distribution

The Gaussian Distribution, or the theory of the normal distribution, is a central theorem in statistics.

import numpy as np
import matplotlib.pyplot as plt
from math import sqrt, pi, exp

domaine = range(-100,100)
mu = 0
sigma = 20

f = lambda x : 1/(sqrt(2*pi*pow(sigma,2))) * exp(-pow((x-mu),2)/(2*pow(sigma,2)))

y = [f(x) for x in domaine]
plot = plt.plot(domaine, y)
Normal Distribution
Normal Distribution

What does it tell?

The normal distribution states that a lot of randomizing data will generate a bell-shape distribution that we have seen. If it does, the distribution will be considered normal.

Let’s do an example.

We will generate random data and plot the histogram to see if it follows a normal distribution.

1. Create a Random Matrix with 1000 random variables, and a 100k sample for each of those variables.

normal_matrix =np.random.rand(1000,100000)

2. Sum the random variable of each column. This will help you visualize a distribution.

matrix_sum = np.sum(normal_matrix,0)

Note that ff you set up the sum to 0, you are going to sum the lines, if you set it up to 1, you are going to sum the columns. You could also set no value and sum all the elements of the matrix.

3. Plot Histogram

plot = plt.hist(matrix_sum, bins=1000)

4. Print Details of Your Distribution

As a data scientist, you will need to generate reports to justify your actions.

print("The mean of your distribution is {}."
print("The mean distribution generated by rand is {}."
print("The variance is {}."
print("The variance of rand is {}."

##The mean of your distribution is 500.04352575358934.
##The mean distribution generated by rand is 0.4999905658429283.
##The variance is 82.77889479273443.
##The variance of rand is 0.08342877035074636.

NumPy Statistical Functions

There are plenty of other operations that you can do using NumPy.

As a Data Scientist, you’ll be faced with a number of statistical problems that can be solved using NumPy Functions. Here is a quick overview of the functions that you might be interested in:

To understand the scope of what you can do, just read how to make a T-Test using NumPy¸or this advanced NumPy Tutorial for Data Analysis.



Pandas is one of the basic libraries that you will need in SEO and in Data Science. This library provides data structure and data analysis tools for Python.

Simply put, it is the library that touches the world of Data Frames.

A data frame is basically a table that is available in languages such as R and Python.

We have seen with Numpy that we could create multi-dimensional arrays easily.

x = np.arange(8).reshape(2,4)
## array([[0, 1, 2, 3],
##       [4, 5, 6, 7]])

This is cool, but not convenient to analyze.

Using Pandas’ data frames, you can create a table.

quarterly_sales = pd.DataFrame(x)

You can, make it better by adding row and column name.

quarterly_sales = pd.DataFrame(x,
            columns = ["Q1","Q2","Q3","Q4"])

You can also select a column using its index. The index is either the column name or the row name.

##2018    0
##2019    4
##Name: Q1, dtype: int32

How to Import An Excel File Using Pandas

You can easily import an Excel file into Python using pandas. To do this, you need to use the read_excel function.

import pandas as pd
df = pd.read_excel(r'Path to your document\yourFile.xlsx')

How to Inspect Your DataFrame

There are multiple functions that you can use in Pandas to help you inspect your dataset: head(), shape, type(),columns().

Let’s create a new data frame.

import pandas as pd
import numpy as np
x = np.arange(10000).reshape(2000,5)
yearly_sales = pd.DataFrame(x,
            columns = ["2015","2016","2017","2018","2019"])

Preview of Your Dataset Using Head() and Tail()

When, you have a large dataset, you might want to show just the first few lines to help you get a good idea of your data structure.

##   2015  2016  2017  2018  2019
##0     0     1     2     3     4
##1     5     6     7     8     9
##2    10    11    12    13    14
##3    15    16    17    18    19
##4    20    21    22    23    24

##      2015  2016  2017  2018  2019
##1995  9975  9976  9977  9978  9979
##1996  9980  9981  9982  9983  9984
##1997  9985  9986  9987  9988  9989
##1998  9990  9991  9992  9993  9994
##1999  9995  9996  9997  9998  9999

Show Rows and Columns of Your DataFrame With Shape

The shape command gives information on the data set size. It gives you a tuple with a count of rows and columns (rows, columns).

##(2000, 5)

Show the Type of Your Dataset

Is your dataset a list? a string? a data frame? Find it by using the type()function.


Print Header Names

To help you visualize your header names by iterating over the columns using for. Really useful when you have a really large dataset.

for col in yearly_sales.columns: 

You can also easily see header name by selecting the first column

##Empty DataFrame
##Columns: [2015,2016,2017,2018,2019]
##Index: []

Or, you can use the columns function.


Get the Column Index Using Name With get_loc

You might want to get column index from column name in python pandas. Do it with get_loc.


Most Useful Dataframe Manipulations

There are many great functions in Pandas for data frame manipulation. In this tutorial, I will show you some of the most useful ones for SEO and Datascience.

Select Only Unique Values

In a dataset, you might end up with duplicates in your columns.

cities = ['montreal',"quebec","montreal","montreal","toronto","vancouver","edmonton"]
cities =pd.DataFrame(cities)
cities.columns = ["city"]

To get a list of unique values, use the unique() function.

Drop NA Values In Rows and Columns

To drop NA values in a Dataframe, use the dropna function. To know more about the function, just follow the well-explained guide on pydata.

Modify Column Names

The rename function helps you rename row or columns of a DataFrame.


Remove Rows or Columns

You can remove rows and columns in your data frame using the drop function.

To remove rows.


This will delete the row with an index equal to 0.

To drop a column.


This will delete the column named “city”.

Select Rows and Columns in DataFrames Using iloc & loc

The easiest way to select and index rows and columns in Python is to use either .iloc or .loc.

You can select data by row or column number using .iloc or to select data by label or conditional statement using .loc.

  • df.iloc[<row number>,<column number>]
  • df.loc[<row label>,<column label>]

To select an entire row using iLoc:


To select an entire column using iloc:


Split Columns & Extract Data Using Delimiters

Let’s say that your first column has values like in a CSV document that uses semi-column spacer.

## Column A
## 322;435;423
## 111;2443;23556
## 222
## 111;354

To split columns using spacers in Pandas, use the str.split function.

newdf = df.iloc[:,1].str.split(";", expand = True) 

Extract Data Using Regex

To break up a string into columns using Regex in pandas, you will need to have both the pandas and the re (Regular expression operations) packages.

import re
import pandas as pd

Create a New Dataset

import pandas as pd
import numpy as np
date = pd.date_range(start='1/01/2019', end='1/05/2019', freq='D')
randnum = pd.DataFrame(np.random.randint(10,99,size=(5, 1)))
semicolumn = pd.DataFrame(';', index=range(5), columns=list('A'))
cities = ['Montreal',"quebec","toronto","Vancouver","Edmonton"] 
cities =pd.DataFrame(cities)
df = pd.concat([date,randnum,semicolumn,cities], axis=1)
df['combined'] = df.apply(lambda row: ' '.join(row.values.astype(str)), axis=1)

Extract a Column With Dates

df['date'] = df['combined'].str.extract('(....-..-..)', expand=True)

Extract a Column With Numbers

df['integer'] = df['combined'].str.extract('( \d\d\ )', expand=True)

Extract a Column With Text

df['city'] = df['combined'].str.extract('([A-Za-z]\w{0,})', expand=True)

Split Column Using Delimiter

df=df['combined'].str.split(";", expand = True) 

Pandas Pivot Tables

Pivot tables are super useful in SEO as well as in Data Science.

Those tables are useful to aggregate and compare large datasets.

import numpy as np
import pandas as pd
import seaborn as sns
tips = sns.load_dataset('tips')
tips.pivot_table('tip', index='sex', columns='time')

Other Useful Pandas Functions

As you can now see, not only are there more than enough packages to work data in Python, there are multiple ways of doing the same thing.

Here, I’ll state a few other functions that you might find useful.

Sort Data in Pandas

df = sorted(df, key=lambda x: x[1], reverse=True)



Matplotlib is a Python package to visualize data, a very important component of data analysis. After this section, you’ll know how to make awesome visualizations for data analysis.

Create a Line Chart With Matplotlib

To create a line chart, you need to use the pyplot subpackage.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]

Note that you have to use the function to actually display the plot. Like this.

Line Chart

You could also print various mathematical functions by merging Numpy and Matplot.

x = np.linspace(start = 0, stop = 10, num = 1000)

Make Great Scatter Plots

Scatter plots are great visualizations to show the relationships between two variables.

Similar to the line charts, they use cartesian coordinates (x and y-axis) to display the values of two variables.

They can help determine the correlation (the impact of one value on the other) between the two variables.

To create the graph, just use the scatter function.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]
Scatter plot
Scatter Plot

Build a Histogram

A histogram is a type of visualization that uses bars of different heights to shows the frequency distribution of your variables.

An histogram is not a bar graph.

The difference between a bar chart and a histogram is that the former is a comparison of discrete variables and shows categorical data.

The latter represents the frequency distribution of continuous variables and presents numerical data.

When you look at a bar graph, you’ll see gaps between the bars.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [100,327,225,300],sales)
Bar Graph
Bar Graph

The histogram, however, has no such gaps.

import numpy as np
import matplotlib.pyplot as plt
dataset = np.random.uniform(0.0,10.0,100)
plt.hist(dataset, bins=10)

Modify The Axis: Limits, Ticks, and Scale

Let’s start with our basic line chart.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]

Tip #1: Set Limits

First, you can add limits to your axis using xlim and ylim.

Set limits to your axis

See in this graph how you can alter the perception of the data, using different visualizations? The same data now looks like there are absolutely no variations from Q1 to Q4.

Tip #2: Set Ticks

Second, choose the coordinates of your axis using xticks and yticks.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]
plt.yticks(np.arange(320.0,331.0, step=1.0))
Add ticks using matplotlib
Add ticks to your plot

Tip #3: Apply a Different Scale

Last, you could make your scale logarithmic.

You can modify the scale of the X and the Y-axis using xscale for the former, and yscale for the latter.


In this case, it doesn’t make any sense to plot a logarithmic scale, so we’ll skip this step.

You could also add color and other funky stuff to your graphs. To learn more about the potential of matplotlib customization, you could read this awesome guide on Towards Data Science.

How to Add a Title And Labels to Your Visualization

To add labels to your axis, use the xlabel and ylabel functions. To add a title, use the title function.

Import your function and create your dataset.

import matplotlib.pyplot as plt
quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]

Add your labels


Add a title.

plt.title("Sales per Quarter")

Create your visualization.


Modify Visual Components

We now have a well-labeled graph. Let’s make it more beautiful.

Change Font-Size

To change font size, use rcParams.update.

quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]
plt.yticks(np.arange(320.0,331.0, step=1.0))
plt.title("Sales per Quarter")

Don’t you like it?

Here is how to reset Matplotlib default settings.


Change The Color of a Graph in MatplotLib

To change the color of your graph in Matplotlib, use the color parameters.

plt.plot(quarter,sales, color="red")

Add a Label and a Legend to Your Graph

Awesome, we’re almost done. Let’s end this up by adding a legend to the graph.

plt.plot(quarter,sales, label="Quarterly Sales")
plt.legend(loc="lower left")

Now, the entire code.

quarter = ["Q1","Q2","Q3","Q4"]
sales = [320.06,327.2, 325.3, 330.4]
plt.yticks(np.arange(320.0,331.0, step=1.0))
plt.title("Sales per Quarter")
plt.plot(quarter,sales, color="red", linestyle="dotted", label="Quarterly Sales")
plt.legend(loc="lower left")

Show Data Uncertainty

Sometimes, as a Data Scientist, you’ll need to make predictions.

Predictions always come with a degree of uncertainty that is represented by the p-value.

If you started with a 95% confidence level and want to show this in your graph, you’ll use plot.errorbar.

x = np.linspace(0, 10, 50)
margin = 0.95
y = np.sin(x) + margin * np.random.randn(50)
plt.errorbar(x, y, yerr=margin, fmt=".");

This is it. If you want to see more customization options with Matplotlib, I suggest that you bookmark this Matplotlib Cheatsheet.



Matplotlib is great but has its flaws. Indeed, a good example of those flaws is that Matplotlib’s functions don’t interact very well with Pandas’ Dataframes. Seaborn is here to the rescue.

Seaborn is a layer added to the Matplotlib package. It brings intuitive functions to help solve most problems encountered by the other library.

Import Seaborn Package

import seaborn as sns

Displot: A Basic Seaborn Function

Let’s recreate our normal distribution graph, this time using Seaborn distplot.

import numpy as np
normal_matrix =np.random.rand(100,1000)
matrix_sum = np.sum(normal_matrix,0)
sns.distplot(matrix_sum, kde=True)
Graph of Normal Distribution built with Seaborn distplot
Graph a Normal Distribution using Seaborn distplot

How to Load a Template DataSet

To build a proper example, let’s load the “Iris” dataset.

Iris is only a data set that is frequently used in tutorials.

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species", height=2)

Make a Linear Regression in Seaborn

Every plot in Seaborn has a set of fixed parameters. For sns.jointplot, there are three mandatory parameters: the x-axis data, the y-axis data, and the dataset.

To make a linear regression, we need to add to those three parameters, the optional parameter kind="reg" (for Linear Regression).

sns.jointplot("total_bill","tip",data=tips, kind='reg')

Note that you could also make a linear regression using lmplot() or regplot(). Just follow this awesome guide on linear regression with Seaborn.

To learn more about Seaborn, just read the official documentation.



The requests library is one of the most important libraries for SEO. It lets you make HTTP requests to servers using Python.

This is the library that you need to:

  • Check for server response code (2XX, 3XX, 4XX…);
  • Post something;
  • Read a JSON file;

Install Requests

!pip install requests
import requests

Get URL Using requests.get()

Calling a URL is the basis of any SEO request. Whether you call an API or a web page, you will need to use the requests.get() function.

response = requests.get("")
# <Response [200]>

View The Attributes You Can Run

Whenever you want to know the attributes and methods available for a specific object (here response), you can use the dir() function.


Here, you can see a bunch of useful functions you can use to access content within this response object, such as:

  • status_code;
  • text;
  • headers;
  • is_redirect
  • json.

Get Response Code

response = requests.get("")
# 200

Get Content Using Text

You can get content from a web page using requests using the text method.

The text method will return the content of response in Unicode.


As you can see, the request returned the HTML content of the page in Unicode (or text). This is useful.

It is not the best way to extract HTML. If you want to “interpret” (i.e. parse) this data. You should use an HTML parser like Beautifulsoup or Requests-html.

Get HTTP Header

You can review your HTTP response headers using headers. This can provide great information about your SEO performances.


You can also select a specific element of the HTTP header.


This is it.

We now have covered the basics of Python for SEO. We have seen how to use Numpy, Pandas, Matplotlib, Seaborn, and Requests packages in Python. Sure, there is plenty more to learn. Keep in touch. Next, we’ll cover what we can do with Pandas for SEO.