The Python requests library is one of the most-used libraries to make HTTP requests using Python.
In this tutorial, you will learn how to:
- Understand the structure of a request
- Make GET and POST requests
- Read and extract elements of the HTML of a web page
- Improve your requests
Let’s learn how to use Python Requests.
How to Use Python Requests
- Install the Python Requests Package
$ pip install requests
- Import the Requests Module
import requests
- Make a Request using the GET method
Use the GET method and store the response in a variable.
r = requests.get(url)
- Read the response using request’s attributes and methods
You can interact with the Python request object using its attributes (e.g.
r.status_code
) and methods (e.g.r.json()
).
Python Requests Install
Install the latest version of python requests using pip.
$ pip install requests
For this guide, you will need to install Python and install the following packages.
$ pip install beautifulsoup4
$ pip install urllib
Import the Request Module
To import the requests
library in Python use the import keyword.
import requests
Requests Methods
- get: Request data
- post: Publish data
- put: Replace data
- patch: Make Partial changes to the data
- delete: Delete data
- head: Similar to get request but without the body
- Request: Create request object by specifying the method to choose
Get Requests
import requests
url = 'https://crawler-test.com/'
response = requests.get(url)
print('URL: ', response.url)
print('Status code: ', response.status_code)
print('HTTP header: ', response.headers)
Output:
URL: https://crawler-test.com/
Status code: 200
HTTP header: {'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Sun, 03 Oct 2021 23:41:59 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8098', 'Connection': 'keep-alive'}
Post Requests
import requests
payload = {
'name':'Jean-Christophe',
'last_name':'Chouinard',
'website':'https://www.jcchouinard.com/'
}
response = requests.post(url, data = payload)
response.json()
Output:
{'args': {},
'data': '',
'files': {},
'form': {'last_name': 'Chouinard',
'name': 'Jean-Christophe',
'website': 'https://www.jcchouinard.com/'},
'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Content-Length': '85',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-615a4271-417e9fff3c75f47f3af9fde2'},
'json': None,
'origin': '149.167.130.162',
'url': 'https://httpbin.org/post'}
Response Methods and Attributes
The response object contains the server’s response to the HTTP request.
You can investigate the details of the Response object by using help()
.
import requests
url = 'https://crawler-test.com/'
response = requests.get(url)
help(response)
In this tutorial we will look at the following:
- text, data descriptor : Content of the response, in unicode.
- content, data descriptor : Content of the response, in bytes.
- url, attribute : URL of the request
- status_code, attribute : Status code returned by the server
- headers, attribute : HTTP headers returned by the server
- history, attribute : list of response objects holding the history of request
- links, attribute : Returns the parsed header links of the response, if any.
- json, method : Returns the json-encoded content of a response, if any.
Access the Response Methods and Attributes
The response from the request is an object in which you can access its methods and attributes.
You can access the attributes using the object.attribute
notation and the methods using the object.method()
notation.
import requests
url = 'http://archive.org/wayback/available?url=jcchouinard.com'
response = requests.get(url)
response.text # access response data atributes and descriptors
response.json() # access response methods
{'url': 'jcchouinard.com',
'archived_snapshots': {'closest': {'status': '200',
'available': True,
'url': 'http://web.archive.org/web/20210930032915/https://www.jcchouinard.com/',
'timestamp': '20210930032915'}}}
Process the Response
Access the Python Requests JSON
In Python requests, the response.json()
method allows to access the JSON object of the response. If the result of the request is not written in a JSON format, the JSON decoder will return the requests.exceptions.JSONDecodeError
exception.
Show Status Code
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.status_code
# 200
Get HTML of the page
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.text # get content as a string
r.content # get content as bytes
Show HTTP header
import requests
url = 'https://crawler-test.com/'
r = requests.get(url)
r.headers
{'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Tue, 05 Oct 2021 04:23:27 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8099', 'Connection': 'keep-alive'}
Show redirections
import requests
url = 'https://crawler-test.com/redirects/redirect_chain_allowed'
r = requests.get(url)
for redirect in r.history:
print(redirect.url, redirect.status_code)
print(r.url, r.status_code)
https://crawler-test.com/redirects/redirect_chain_allowed 301
https://crawler-test.com/redirects/redirect_chain_disallowed 301
https://crawler-test.com/redirects/redirect_target 200
Parse the HTML with Request and BeautifulSoup
Parsing with BeautifulSoup
from bs4 import BeautifulSoup
import requests
# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)
r.text[:500]
You can see that the text is hard to interpret string.
'<!DOCTYPE html>\n<html>\n <head>\n <title>Crawler Test Site</title>\n \n <meta content="en" HTTP-EQUIV="content-language"/>\n \n <link type="text/css" href="/css/app.css" rel="stylesheet"/>\n <link type="image/x-icon" href="/favicon.ico?r=1.6" rel="icon"/>\n <script type="text/javascript" src="/bower_components/jquery/jquery.min.js"></script>\n \n <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>\n \n\n \n <link rel="alternate" media'
# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
soup
<!DOCTYPE html>
<html>
<head>
<title>Crawler Test Site</title>
<meta content="en" http-equiv="content-language"/>
<link href="/css/app.css" rel="stylesheet" type="text/css"/>
...
</html>
The output is easier to interpret now that it was parsed with BeautifulSoup.
You can extract tag using the find()
or find_all()
methods.
soup.find('title')
Output:
<title>Crawler Test Site</title>
soup.find_all('meta')
Output:
[<meta content="en" http-equiv="content-language"/>,
<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>,
<meta content="nositelinkssearchbox" name="google"/>,
<meta content="0H-EBys8zSFUxmeV9xynoMCMePTzkUEL_lXrm9C4a8A" name="google-site-verification"/>]
Or, even select the attributes of the tag.
soup.find('meta', attrs={'name':'description'})
Output:
<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>
from bs4 import BeautifulSoup
import requests
# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)
# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
# Get the HTML tags
title = soup.find('title')
h1 = soup.find('h1')
description = soup.find('meta', attrs={'name':'description'})
meta_robots = soup.find('meta', attrs={'name':'robots'})
canonical = soup.find('link', {'rel': 'canonical'})
# Get the text from the HTML tags
title = title.get_text() if title else ''
h1 = h1.get_text() if h1 else ''
description = description['content'] if description else ''
meta_robots = meta_robots['content'] if meta_robots else ''
canonical = canonical['href'] if canonical else ''
# Print the tags
print('Title: ', title)
print('h1: ', h1)
print('description: ', description)
print('meta_robots: ', meta_robots)
print('canonical: ', canonical)
Output:
Title: Crawler Test Site
h1: Crawler Test Site
description: Default description XIbwNE7SSUJciq0/Jyty
meta_robots:
canonical:
Extracting all the links on a page
from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin
url = 'https://crawler-test.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
links = []
for link in soup.find_all('a', href=True):
full_url = urljoin(url, link['href']) # join domain to path
links.append(full_url)
# Show 5 links
links[:5]
Output:
['https://crawler-test.com/',
'https://crawler-test.com/mobile/separate_desktop',
'https://crawler-test.com/mobile/desktop_with_AMP_as_mobile',
'https://crawler-test.com/mobile/separate_desktop_with_different_h1',
'https://crawler-test.com/mobile/separate_desktop_with_different_title']
Improve the Request
Query String Parameters
The query parameters allow you to customize your Python request by passing values to the query string parameters. Most API requests require to add query parameters to the request. This is the case with the Wikipedia API.
import requests
url = 'https://en.wikipedia.org/w/api.php'
params = {
'action': 'query',
'format': 'json',
'titles': 'Requests (software)',
'prop': 'extracts'
}
response = requests.get(url, params=params)
print('Request URL:', response.url)
To add query string parameters, pass a dictionary of parameters to the params
argument. Here is how the request url looks like.
# Result
Request URL: https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Requests+%28software%29
data = response.json()
page = next(iter(data['query']['pages'].values()))
print(page['extract'][:73])
Handle Errors
import requests
url = 'bad url'
try:
r = requests.get(url)
except Exception as e:
print(f'There was an error: {e}')
There was an error: Invalid URL 'bad url': No schema supplied. Perhaps you meant http://bad url?
Change User-Agent
import requests
url = 'https://www.reddit.com/r/python/top.json?limit=1&t=day'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
r = requests.get(url, headers=headers)
Add Timeout to request
import requests
url = 'http://httpbin.org/basic-auth/user/pass'
try:
r = requests.get(url, timeout=0.1)
except Exception as e:
print(e)
r.status_code
HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /basic-auth/user/pass (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb03a7fa290>, 'Connection to httpbin.org timed out. (connect timeout=0.1)'))
401
Use Proxies
import requests
url = 'https://crawler-test.com/'
proxies = {
'http': '128.199.237.57:8080'
}
r = requests.get(url, proxies=proxies)
Add Headers to Requests
import requests
url = 'http://httpbin.org/headers'
access_token = {
'Authorization': 'Bearer {access_token}'
}
r = requests.get(url, headers=access_token)
r.json()
{'headers': {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Authorization': 'Bearer {access_token}',
'Host': 'httpbin.org',
'User-Agent': 'python-requests/2.24.0',
'X-Amzn-Trace-Id': 'Root=1-615aa0b3-5c680dcb575d50f22e9565eb'}}
Requests Session
The session object is useful when you need to make requests with parameters that persist through all the requests in a single session.
import requests
session = requests.Session()
url = 'https://httpbin.org/headers'
access_token = {
'Authorization': 'Bearer {access_token}'
}
session.headers.update(access_token)
r1 = session.get(url)
r2 = session.get(url)
print('r1: ', r1.json()['headers']['Authorization'])
print('r2: ', r2.json()['headers']['Authorization'])
r1: Bearer {access_token}
r2: Bearer {access_token}
Handling Retries in Python Requests
import requests
from requests.adapters import HTTPAdapter, Retry
s = requests.Session()
retries = Retry(total=3,
backoff_factor=0.1,
status_forcelist=[500,502,503,504])
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
try:
r = s.get('https://httpstat.us/500')
except Exception as e:
print(type(e))
print(e)
Other HTTP Methods
On top of GET
and POST
Requests, the Python library allows to use other popular HTTP methods such as HEAD
, PUT
, DELETE
, PATCH
and OPTIONS
.
requests.head('https://httpbin.org/get') # Get request HTTP header
requests.put('https://httpbin.org/put', data={'key':'value'}) # Create new resource with payload
requests.delete('https://httpbin.org/delete') # Delete resource
requests.patch('https://httpbin.org/patch', data={'key':'value'}) # Partial modification
requests.options('https://httpbin.org/get') # Specify communication options
What is the Python Requests Library?
The python requests library, also known as python requests, is a HTTP library that allows users to send HTTP requests using Python. Its tagline “Python HTTP for Humans” represents well the simplicity of the package.
Tutorials using Requests
- Wikipedia API with Python
- Read RSS Feed with Python and Beautiful Soup
- How to Post on LinkedIn API With Python
- Reddit API Without API Credentials
- Send Message With Slack API and Python
- What GMB Categories are the Competition Using?
- Python Libraries for SEO – Beginner Guide
- Random User-Agent With Python and BeautifulSoup (by JR Oakes)
- Get BERT Score for SEO (by Pierre Rouarch)
Interesting work from the community
- How to Check Status Codes of URLs in a Sitemap via Python (by Koray Tuğberk GÜBÜR)
- Automatically Find SEO Interlinking Opportunities with Python (by Greg Bernhardt)
- Yoast SEO API Python example with Requests + Pandas (by Erick Rumbold)
- How To Download Multiple Images In Python (by James Phoenix)
- Google Autosuggest Trends for Niche Keywords (by Stefan Neefischer)
- Asynchronous Web Scraping Python (by James Phoenix)
Other Web Scraping Tutorials
Facts about Python Requests
Python Requests Author | Kenneth Reitz |
Python Requests Language | Python |
Python Requests Methods | GET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD |
Python Requests Release | 2011-02-14 |
Conclusion
If you are looking for an alternative to the requests library, you may be interested in the requests-HTML library that provides some built-in HTML parsing options.
This library is not only useful for web scraping, but also for web development and any other endeavour that uses APIs.
We now conclude the introduction on the Python Requests library.

SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.