Python Requests Tutorial With Examples (and Video)

Share this post

The Python requests library is one of the most-used libraries to make HTTP requests using Python.

In this tutorial, you will learn how to:

  • Understand the structure of a request
  • Make GET and POST requests
  • Read and extract elements of the HTML of a web page
  • Improve your requests

Let’s learn how to use Python Requests.


Subscribe to my Newsletter


How to Use Python Requests

  1. Install the Python Requests Package

    $ pip install requests

  2. Import the Requests Module

    import requests

  3. Make a Request using the GET method

    Use the GET method and store the response in a variable.
    r = requests.get(url)

  4. Read the response using request’s attributes and methods

    You can interact with the Python request object using its attributes (e.g. r.status_code) and methods (e.g. r.json()).

Python Requests Install

Install the latest version of python requests using pip.

$ pip install requests

For this guide, you will need to install Python and install the following packages.

$ pip install beautifulsoup4
$ pip install urllib

Import the Request Module

To import the requests library in Python use the import keyword.

import requests

Requests Methods

  • get: Request data
  • post: Publish data
  • put: Replace data
  • patch: Make Partial changes to the data
  • delete: Delete data
  • head: Similar to get request but without the body
  • Request: Create request object by specifying the method to choose

Get Requests

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

print('URL: ', response.url)
print('Status code: ', response.status_code)
print('HTTP header: ', response.headers)

Output:

URL:  https://crawler-test.com/
Status code:  200
HTTP header:  {'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Sun, 03 Oct 2021 23:41:59 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8098', 'Connection': 'keep-alive'}

Post Requests

import requests

payload = {
    'name':'Jean-Christophe',
    'last_name':'Chouinard',
    'website':'https://www.jcchouinard.com/'
    }

response = requests.post(url, data = payload)

response.json()

Output:

{'args': {},
 'data': '',
 'files': {},
 'form': {'last_name': 'Chouinard',
  'name': 'Jean-Christophe',
  'website': 'https://www.jcchouinard.com/'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '85',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615a4271-417e9fff3c75f47f3af9fde2'},
 'json': None,
 'origin': '149.167.130.162',
 'url': 'https://httpbin.org/post'}

Response Methods and Attributes

The response object contains the server’s response to the HTTP request.

You can investigate the details of the Response object by using help().

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

help(response)

In this tutorial we will look at the following:

  • text, data descriptor : Content of the response, in unicode.
  • content, data descriptor : Content of the response, in bytes.
  • url, attribute : URL of the request
  • status_code, attribute : Status code returned by the server
  • headers, attribute : HTTP headers returned by the server
  • history, attribute : list of response objects holding the history of request
  • links, attribute : Returns the parsed header links of the response, if any.
  • json, method : Returns the json-encoded content of a response, if any.

Access the Response Methods and Attributes

The response from the request is an object in which you can access its methods and attributes.

You can access the attributes using the object.attribute notation and the methods using the object.method() notation.

import requests

url = 'http://archive.org/wayback/available?url=jcchouinard.com'
response = requests.get(url)

response.text # access response data atributes and descriptors
response.json() # access response methods
{'url': 'jcchouinard.com',
 'archived_snapshots': {'closest': {'status': '200',
   'available': True,
   'url': 'http://web.archive.org/web/20210930032915/https://www.jcchouinard.com/',
   'timestamp': '20210930032915'}}}

Process the Response

Access the Python Requests JSON

In Python requests, the response.json() method allows to access the JSON object of the response. If the result of the request is not written in a JSON format, the JSON decoder will return the requests.exceptions.JSONDecodeError exception.

Show Status Code

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.status_code
# 200

Get HTML of the page

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.text # get content as a string
r.content # get content as bytes

Show HTTP header

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)
r.headers
{'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Tue, 05 Oct 2021 04:23:27 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8099', 'Connection': 'keep-alive'}

Show redirections

import requests

url = 'https://crawler-test.com/redirects/redirect_chain_allowed'
r = requests.get(url)

for redirect in r.history:
    print(redirect.url, redirect.status_code)
print(r.url, r.status_code)
https://crawler-test.com/redirects/redirect_chain_allowed 301
https://crawler-test.com/redirects/redirect_chain_disallowed 301
https://crawler-test.com/redirects/redirect_target 200

Parse the HTML with Request and BeautifulSoup

Parsing with BeautifulSoup

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

r.text[:500]

You can see that the text is hard to interpret string.

'<!DOCTYPE html>\n<html>\n  <head>\n    <title>Crawler Test Site</title>\n    \n      <meta content="en" HTTP-EQUIV="content-language"/>\n         \n    <link type="text/css" href="/css/app.css" rel="stylesheet"/>\n    <link type="image/x-icon" href="/favicon.ico?r=1.6" rel="icon"/>\n    <script type="text/javascript" src="/bower_components/jquery/jquery.min.js"></script>\n    \n      <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>\n    \n\n    \n        <link rel="alternate" media'
# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
soup
<!DOCTYPE html>

<html>
<head>
<title>Crawler Test Site</title>
<meta content="en" http-equiv="content-language"/>
<link href="/css/app.css" rel="stylesheet" type="text/css"/>
...
</html>

The output is easier to interpret now that it was parsed with BeautifulSoup.

You can extract tag using the find() or find_all() methods.

soup.find('title')

Output:

<title>Crawler Test Site</title>
soup.find_all('meta')

Output:

[<meta content="en" http-equiv="content-language"/>,
 <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>,
 <meta content="nositelinkssearchbox" name="google"/>,
 <meta content="0H-EBys8zSFUxmeV9xynoMCMePTzkUEL_lXrm9C4a8A" name="google-site-verification"/>]

Or, even select the attributes of the tag.

soup.find('meta', attrs={'name':'description'})

Output:

<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>

Getting main SEO tags from a webpage

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')

# Get the HTML tags
title = soup.find('title')
h1 = soup.find('h1')
description = soup.find('meta', attrs={'name':'description'})
meta_robots =  soup.find('meta', attrs={'name':'robots'})
canonical = soup.find('link', {'rel': 'canonical'})

# Get the text from the HTML tags
title = title.get_text() if title else ''
h1 = h1.get_text() if h1 else ''
description = description['content'] if description else ''
meta_robots =  meta_robots['content'] if meta_robots else ''
canonical = canonical['href'] if canonical else ''

# Print the tags
print('Title: ', title)
print('h1: ', h1)
print('description: ', description)
print('meta_robots: ', meta_robots)
print('canonical: ', canonical)

Output:

Title:  Crawler Test Site
h1:  Crawler Test Site
description:  Default description XIbwNE7SSUJciq0/Jyty
meta_robots:  
canonical:  

Extracting all the links on a page

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin

url = 'https://crawler-test.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

links = []
for link in soup.find_all('a', href=True):
    full_url = urljoin(url, link['href']) # join domain to path
    links.append(full_url)

# Show 5 links
links[:5]

Output:

['https://crawler-test.com/',
 'https://crawler-test.com/mobile/separate_desktop',
 'https://crawler-test.com/mobile/desktop_with_AMP_as_mobile',
 'https://crawler-test.com/mobile/separate_desktop_with_different_h1',
 'https://crawler-test.com/mobile/separate_desktop_with_different_title']

Improve the Request

Query String Parameters

The query parameters allow you to customize your Python request by passing values to the query string parameters. Most API requests require to add query parameters to the request. This is the case with the Wikipedia API.

import requests
 
url = 'https://en.wikipedia.org/w/api.php'

params = {
        'action': 'query',
        'format': 'json',
        'titles': 'Requests (software)',
        'prop': 'extracts'
    }
 
response = requests.get(url, params=params)

print('Request URL:', response.url)

To add query string parameters, pass a dictionary of parameters to the params argument. Here is how the request url looks like.

# Result
Request URL: https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Requests+%28software%29
data = response.json()
 
page = next(iter(data['query']['pages'].values()))
print(page['extract'][:73])

Handle Errors

import requests

url = 'bad url'

try:
    r = requests.get(url)
except Exception as e:
    print(f'There was an error: {e}')
There was an error: Invalid URL 'bad url': No schema supplied. Perhaps you meant http://bad url?

Change User-Agent

import requests 

url = 'https://www.reddit.com/r/python/top.json?limit=1&t=day'

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}

r = requests.get(url, headers=headers)

Add Timeout to request

import requests

url = 'http://httpbin.org/basic-auth/user/pass'

try:
    r = requests.get(url, timeout=0.1)
except Exception as e:
    print(e)

r.status_code
HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /basic-auth/user/pass (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb03a7fa290>, 'Connection to httpbin.org timed out. (connect timeout=0.1)'))
401

Use Proxies

import requests 

url = 'https://crawler-test.com/'

proxies = {
    'http': '128.199.237.57:8080'
}

r = requests.get(url, proxies=proxies)

Add Headers to Requests

import requests 

url = 'http://httpbin.org/headers'

access_token = {
    'Authorization': 'Bearer {access_token}'
    }

r = requests.get(url, headers=access_token)
r.json()
{'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Authorization': 'Bearer {access_token}',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615aa0b3-5c680dcb575d50f22e9565eb'}}

Requests Session

The session object is useful when you need to make requests with parameters that persist through all the requests in a single session.

import requests

session = requests.Session()

url = 'https://httpbin.org/headers'

access_token = {
    'Authorization': 'Bearer {access_token}'
    }

session.headers.update(access_token)

r1 = session.get(url)
r2 = session.get(url)

print('r1: ', r1.json()['headers']['Authorization'])
print('r2: ', r2.json()['headers']['Authorization'])
r1:  Bearer {access_token}
r2:  Bearer {access_token}

Handling Retries in Python Requests

import requests
from requests.adapters import HTTPAdapter, Retry

s = requests.Session()

retries = Retry(total=3,
                backoff_factor=0.1,
                status_forcelist=[500,502,503,504])
                
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))


try:
    r = s.get('https://httpstat.us/500')
except Exception as e:
    print(type(e))
    print(e)

Other HTTP Methods

On top of GET and POST Requests, the Python library allows to use other popular HTTP methods such as HEAD, PUT, DELETE, PATCH and OPTIONS.

requests.head('https://httpbin.org/get') # Get request HTTP header
requests.put('https://httpbin.org/put', data={'key':'value'})  # Create new resource with payload
requests.delete('https://httpbin.org/delete') # Delete resource
requests.patch('https://httpbin.org/patch', data={'key':'value'}) # Partial modification
requests.options('https://httpbin.org/get') # Specify communication options

What is the Python Requests Library?

The python requests library, also known as python requests, is a HTTP library that allows users to send HTTP requests using Python. Its tagline “Python HTTP for Humans” represents well the simplicity of the package.

Tutorials using Requests

Interesting work from the community

Other Web Scraping Tutorials

Facts about Python Requests

Python Requests AuthorKenneth Reitz
Python Requests LanguagePython
Python Requests MethodsGET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
Python Requests Release2011-02-14

Conclusion

If you are looking for an alternative to the requests library, you may be interested in the requests-HTML library that provides some built-in HTML parsing options.

This library is not only useful for web scraping, but also for web development and any other endeavour that uses APIs.

We now conclude the introduction on the Python Requests library.

5/5 - (4 votes)