The Python requests library is one of the most-used libraries to make HTTP requests using Python.
In this tutorial, you will learn how to:
- Understand the structure of a request
- Make GET and POST requests
- Read and extract elements of the HTML of a web page
- Improve your requests
Install Packages
For this guide, you will need to install Python and install the following packages.
$ pip install requests
$ pip install beautifulsoup4
$ pip install urllib
Requests Methods
- get: Request data
- post: Publish data
- put: Replace data
- patch: Make Partial changes to the data
- delete: Delete data
- head: Similar to get request but without the body
- Request: Create request object by specifying the method to choose
Get Requests
import requests
url = 'https://crawler-test.com/'
response = requests.get(url)
print('URL: ', response.url)
print('Status code: ', response.status_code)
print('HTTP header: ', response.headers)
Output:
URL: https://crawler-test.com/
Status code: 200
HTTP header: {'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Sun, 03 Oct 2021 23:41:59 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8098', 'Connection': 'keep-alive'}
Post Requests
import requests
payload = {
'name':'Jean-Christophe',
'last_name':'Chouinard',
'website':'https://www.jcchouinard.com/'
}
response = requests.post(, data = payload)
response.json()
Output:
{'args': {},
'data': '',
'files': {},
'form': {'last_name': 'Chouinard',
'name': 'Jean-Christophe',
'website': 'https://www.jcchouinard.com/'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '85',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-615a4271-417e9fff3c75f47f3af9fde2'},
'json': None,
'origin': '149.167.130.162',
'url': 'https://httpbin.org/post'}
Response Methods and Attributes
The response object contains the server’s response to the HTTP request.
You can investigate the details of the Response object by using help()
.
import requests
url = 'https://crawler-test.com/'
response = requests.get(url)
help(response)
In this tutorial we will look at the following:
- text, data descriptor : Content of the response, in unicode.
- content, data descriptor : Content of the response, in bytes.
- url, attribute : URL of the request
- status_code, attribute : Status code returned by the server
- headers, attribute : HTTP headers returned by the server
- history, attribute : list of response objects holding the history of request
- links, attribute : Returns the parsed header links of the response, if any.
- json, method : Returns the json-encoded content of a response, if any.
Access the Response Methods and Attributes
The response from the request is an object in which you can access its methods and attributes.
You can access the attributes using the object.attribute
notation and the methods using the object.method()
notation.
import requests
url = 'http://archive.org/wayback/available?url=jcchouinard.com'
response = requests.get(url)
response.text # access response data atributes and descriptors
response.json() # access response methods
{'url': 'jcchouinard.com',
'archived_snapshots': {'closest': {'status': '200',
'available': True,
'url': 'http://web.archive.org/web/20210930032915/https://www.jcchouinard.com/',
'timestamp': '20210930032915'}}}
Process the Response
Show Status Code
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.status_code
# 200
Get HTML of the page
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.text # get content as a string
r.content # get content as bytes
Show HTTP header
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.headers
{'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Tue, 05 Oct 2021 04:23:27 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8099', 'Connection': 'keep-alive'}
Show redirections
import requests
url = 'https://crawler-test.com/redirects/redirect_chain_allowed'
r = requests.get(url)
for redirect in r.history:
print(redirect.url, redirect.status_code)
print(r.url, r.status_code)
https://crawler-test.com/redirects/redirect_chain_allowed 301
https://crawler-test.com/redirects/redirect_chain_disallowed 301
https://crawler-test.com/redirects/redirect_target 200
Parse the HTML with Request and BeautifulSoup
Parsing with BeautifulSoup
from bs4 import BeautifulSoup
import requests
# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)
r.text[:500]
You can see that the text is hard to interpret string.
'<!DOCTYPE html>\n<html>\n <head>\n <title>Crawler Test Site</title>\n \n <meta content="en" HTTP-EQUIV="content-language"/>\n \n <link type="text/css" href="/css/app.css" rel="stylesheet"/>\n <link type="image/x-icon" href="/favicon.ico?r=1.6" rel="icon"/>\n <script type="text/javascript" src="/bower_components/jquery/jquery.min.js"></script>\n \n <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>\n \n\n \n <link rel="alternate" media'
# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
soup
<!DOCTYPE html>
<html>
<head>
<title>Crawler Test Site</title>
<meta content="en" http-equiv="content-language"/>
<link href="/css/app.css" rel="stylesheet" type="text/css"/>
...
</html>
The output is easier to interpret now that it was parsed with BeautifulSoup.
You can extract tag using the find()
or find_all()
methods.
soup.find('title')
Output:
<title>Crawler Test Site</title>
soup.find_all('meta')
Output:
[<meta content="en" http-equiv="content-language"/>,
<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>,
<meta content="nositelinkssearchbox" name="google"/>,
<meta content="0H-EBys8zSFUxmeV9xynoMCMePTzkUEL_lXrm9C4a8A" name="google-site-verification"/>]
Or, even select the attributes of the tag.
soup.find('meta', attrs={'name':'description'})
Output:
<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>
from bs4 import BeautifulSoup
import requests
# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)
# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
# Get the HTML tags
title = soup.find('title')
h1 = soup.find('h1')
description = soup.find('meta', attrs={'name':'description'})
meta_robots = soup.find('meta', attrs={'name':'robots'})
canonical = soup.find('link', {'rel': 'canonical'})
# Get the text from the HTML tags
title = title.get_text() if title else ''
h1 = h1.get_text() if h1 else ''
description = description['content'] if description else ''
meta_robots = meta_robots['content'] if meta_robots else ''
canonical = canonical['href'] if canonical else ''
# Print the tags
print('Title: ', title)
print('h1: ', h1)
print('description: ', description)
print('meta_robots: ', meta_robots)
print('canonical: ', canonical)
Output:
Title: Crawler Test Site
h1: Crawler Test Site
description: Default description XIbwNE7SSUJciq0/Jyty
meta_robots:
canonical:
Extracting all the links on a page
from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin
url = 'https://crawler-test.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
links = []
for link in soup.find_all('a', href=True):
full_url = urljoin(url, link['href']) # join domain to path
links.append(full_url)
# Show 5 links
links[:5]
Output:
['https://crawler-test.com/',
'https://crawler-test.com/mobile/separate_desktop',
'https://crawler-test.com/mobile/desktop_with_AMP_as_mobile',
'https://crawler-test.com/mobile/separate_desktop_with_different_h1',
'https://crawler-test.com/mobile/separate_desktop_with_different_title']
Improve the Request
Handle Errors
import requests
url = 'bad url'
try:
r = requests.get(url)
except Exception as e:
print(f'There was an error: {e}')
There was an error: Invalid URL 'bad url': No schema supplied. Perhaps you meant http://bad url?
Change User-Agent
import requests
url = 'https://www.reddit.com/r/python/top.json?limit=1&t=day'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
r = requests.get(url, headers=headers)
Add Timeout to request
import requests
url = 'http://httpbin.org/basic-auth/user/pass'
try:
r = requests.get(url, timeout=0.1)
except Exception as e:
print(e)
r.status_code
HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /basic-auth/user/pass (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb03a7fa290>, 'Connection to httpbin.org timed out. (connect timeout=0.1)'))
401
Use Proxies
import requests
url = 'https://crawler-test.com/'
proxies = {
'http': '128.199.237.57:8080'
}
r = requests.get(url, proxies=proxies)
Add Headers to Requests
import requests
url = 'http://httpbin.org/headers'
access_token = {
'Authorization': 'Bearer {access_token}'
}
r = requests.get(url, headers=access_token)
r.json()
{'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Authorization': 'Bearer {access_token}',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-615aa0b3-5c680dcb575d50f22e9565eb'}}
Requests Session
The session object is useful when you need to make requests with parameters that persist through all the requests in a single session.
import requests
session = requests.Session()
url = 'https://httpbin.org/headers'
access_token = {
'Authorization': 'Bearer {access_token}'
}
session.headers.update(access_token)
r1 = session.get(url)
r2 = session.get(url)
print('r1: ', r1.json()['headers']['Authorization'])
print('r2: ', r2.json()['headers']['Authorization'])
r1: Bearer {access_token}
r2: Bearer {access_token}
Tutorials using Requests
- Wikipedia API with Python
- Read RSS Feed with Python and Beautiful Soup
- How to Post on LinkedIn API With Python
- Reddit API Without API Credentials
- Send Message With Slack API and Python
- What GMB Categories are the Competition Using?
- Python Libraries for SEO – Beginner Guide
- Random User-Agent With Python and BeautifulSoup (by JR Oakes)
- Get BERT Score for SEO (by Pierre Rouarch)
Interesting work from the community
- How to Check Status Codes of URLs in a Sitemap via Python (by Koray Tuğberk GÜBÜR)
- Automatically Find SEO Interlinking Opportunities with Python (by Greg Bernhardt)
- Yoast SEO API Python example with Requests + Pandas (by Erick Rumbold)
- How To Download Multiple Images In Python (by James Phoenix)
- Google Autosuggest Trends for Niche Keywords (by Stefan Neefischer)
- Asynchronous Web Scraping Python (by James Phoenix)
Conclusion
If you are looking for an alternative to the requests library, you may be interested in the requests-HTML library that provides some built-in HTML parsing options.
We now conclude the introduction on the Python Requests library.
This library is not only useful for web scraping, but also for web development and any other endeavour that uses APIs.

Sr SEO Specialist at Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.