How to Use Python Requests Library (Example and Video) – Guided Tutorial

The Python requests library is an HTTP requests library in Python.

Python Requests Example

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.text

In this tutorial, you will learn how to use the Python requests module with examples:

  • Understand the structure of a request
  • Make GET and POST requests
  • Read and extract elements of the HTML of a web page
  • Improve your requests
Navigation Show

What is the Python Requests Library?

The python requests library, also known as python requests, is a HTTP library that allows users to send HTTP requests using Python. Its tagline “Python HTTP for Humans” represents well the simplicity of the package.


Subscribe to my Newsletter


Let’s learn how to use Python Requests.

How to Use Python Requests

Follow these steps to use the Python requests module.

  1. Install the Python Requests Package

    $ pip install requests

  2. Import the Requests Module

    import requests

  3. Make a Request using the GET method

    Use the GET method and store the response in a variable.
    r = requests.get(url)

  4. Read the response using request’s attributes and methods

    You can interact with the Python request object using its attributes (e.g. r.status_code) and methods (e.g. r.json()).

Download and Install Python Requests

Use pip to install the latest version of python requests.

$ pip install requests

For this guide, you will need to install Python and install the following packages.

$ pip install beautifulsoup4
$ pip install urllib

Import the Request Module

To import the requests library in Python use the import keyword.

import requests

Python Requests Functions

Below are listed the Python requests functions:

  • get: Sends a GET requests to a given URL. E.g. get(url, parameters, arguments)
  • post: Sends a POST request to publish specified data to a given URL. E.g. post(url, data, json, arguments)
  • put: Sends a PUT request to replace data at a given URL. E.g. put(url, data, arguments)
  • patch: Sends a PATCH request to make partial changes to the data of a given URL. E.g patch(url, data, arguments)
  • delete: Sends a DELETE request to delete data from a given URL. E.g delete(url, arguments)
  • head: Sends a HEAD request to a given URL. This is similar to a GET request, but without the body. E.g. head(url, arguments)
  • options: Specify communication options for a given URL. E.g. options(url)
  • Request: Creates request object by specifying the method to choose.

We will now view some examples of requests functions in Python.

Python Get Requests

The python requests get() function sends GET requests to a web server for a given URL, set of parameters and arguments. The get() function follows this pattern:

get(url, parameters, arguments)

How to Send Get Requests in Python

To send GET requests in Python, use the get() function with the URL that you want to retrieve information from.

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

print('URL: ', response.url)
print('Status code: ', response.status_code)
print('HTTP header: ', response.headers)

Output:

URL:  https://crawler-test.com/
Status code:  200
HTTP header:  {'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Sun, 03 Oct 2021 23:41:59 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8098', 'Connection': 'keep-alive'}

Post Requests

The python requests post() function sends POST requests to a web server to publish specified data to a given URL. The post() function follows this pattern:

post(url, data, json, arguments)

How to Send Post Requests in Python

To send POST requests in Python, use the post() function. Add the URL and a dictionary representation of the data to be published to the data parameter.

import requests

url = 'https://httpbin.org/post'

payload = {
    'name':'Jean-Christophe',
    'last_name':'Chouinard',
    'website':'https://www.jcchouinard.com/'
    }

response = requests.post(url, data = payload)

response.json()

Output:

{'args': {},
 'data': '',
 'files': {},
 'form': {'last_name': 'Chouinard',
  'name': 'Jean-Christophe',
  'website': 'https://www.jcchouinard.com/'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '85',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615a4271-417e9fff3c75f47f3af9fde2'},
 'json': None,
 'origin': '149.167.130.162',
 'url': 'https://httpbin.org/post'}

Python Response Object’s Methods and Attributes

The python.response object contains the server’s response to the HTTP request.

You can investigate the details of the Response object by using help().

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

help(response)

In this tutorial we will look at the following:

  • text, data descriptor : Content of the response, in unicode.
  • content, data descriptor : Content of the response, in bytes.
  • url, attribute : URL of the request
  • status_code, attribute : Status code returned by the server
  • headers, attribute : HTTP headers returned by the server
  • history, attribute : list of response objects holding the history of request
  • links, attribute : Returns the parsed header links of the response, if any.
  • json, method : Returns the json-encoded content of a response, if any.

Access the Response Methods and Attributes

The response from the request is an object in which you can access its methods and attributes.

You can access the attributes using the object.attribute notation and the methods using the object.method() notation.

import requests

url = 'http://archive.org/wayback/available?url=jcchouinard.com'
response = requests.get(url)

response.text # access response data atributes and descriptors
response.json() # access response methods
{'url': 'jcchouinard.com',
 'archived_snapshots': {'closest': {'status': '200',
   'available': True,
   'url': 'http://web.archive.org/web/20210930032915/https://www.jcchouinard.com/',
   'timestamp': '20210930032915'}}}

Process the Python Response

How to Access the JSON of Python Requests

In Python requests, the response.json() method allows to access the JSON object of the response. If the result of the request is not written in a JSON format, the JSON decoder will return the requests.exceptions.JSONDecodeError exception.

How to Show the Status Code of a Python Request

To show the status code returned by a Python get() request use the status_code attribute of the response Object.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.status_code
# 200

How to Get the HTML of the Page with Python Requests

To get the HTML of a web page using python requests, make a GET request to a given URL and the text attribute of the response Object to get the HTML as Unicode and the content attribute to get HTML as bytes.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.text # get content as a string
r.content # get content as bytes

How to Show the HTTP header of a GET Request

To show the HTTP headers used in a Python GET request, use the headers attribute of the response Object.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)
r.headers
{'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Tue, 05 Oct 2021 04:23:27 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8099', 'Connection': 'keep-alive'}

How to Show HTTP Redirects with Python Requests

To show HTTP redirects that happened during a Python get() request, use the history attribute of the response Object. Create a for loop on the response.history and get the .url and .status_code attributes of each element of the history.

import requests

url = 'https://crawler-test.com/redirects/redirect_chain_allowed'
r = requests.get(url)

for redirect in r.history:
    print(redirect.url, redirect.status_code)
print(r.url, r.status_code)
https://crawler-test.com/redirects/redirect_chain_allowed 301
https://crawler-test.com/redirects/redirect_chain_disallowed 301
https://crawler-test.com/redirects/redirect_target 200

Parse the HTML with Requests and BeautifulSoup

To parse the HTML of Python requests, use the lxml or the BeautifulSoup library on the response object.

BeautifulSoup is a Python library that allow you to parse HTML and XML to pull data from them. BeautifulSoup can be used to parse the HTML returned in the Python Response object.

How the Python Response Object Returns the HTML

The Python Response object, returns the HTML from the URL passed in the get() request as Unicode or Bytes.

The textual format of the returned HTML makes it hard to extract information from it.

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

r.text[:500]
'<!DOCTYPE html>\n<html>\n  <head>\n    <title>Crawler Test Site</title>\n    \n      <meta content="en" HTTP-EQUIV="content-language"/>\n         \n    <link type="text/css" href="/css/app.css" rel="stylesheet"/>\n    <link type="image/x-icon" href="/favicon.ico?r=1.6" rel="icon"/>\n    <script type="text/javascript" src="/bower_components/jquery/jquery.min.js"></script>\n    \n      <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>\n    \n\n    \n        <link rel="alternate" media'

How to Parse HTML with BeautifulSoup

To parse the HTML returned from a get() request, pass the response.text attribute to the BeautifulSoup class of the bs4 library. Use the ‘html.parser’ argument to parse the HTML.

# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
soup

This will return a soup object that can be used to extract data from it.

<!DOCTYPE html>

<html>
<head>
<title>Crawler Test Site</title>
<meta content="en" http-equiv="content-language"/>
<link href="/css/app.css" rel="stylesheet" type="text/css"/>
...
</html>

The output is easier to interpret now that it was parsed with BeautifulSoup.

You can extract tag using the find() or find_all() methods.

soup.find('title')

Output:

<title>Crawler Test Site</title>
soup.find_all('meta')

Output:

[<meta content="en" http-equiv="content-language"/>,
 <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>,
 <meta content="nositelinkssearchbox" name="google"/>,
 <meta content="0H-EBys8zSFUxmeV9xynoMCMePTzkUEL_lXrm9C4a8A" name="google-site-verification"/>]

Or, even select the attributes of the tag.

soup.find('meta', attrs={'name':'description'})

Output:

<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>

How to Get the Main SEO tags from a Webpage

To extract the main SEO tags from a web page, use requests along with the BeautifulSoup parsing library. The find() method of the soup object will allow you to extract HTML tags such as the H1, the title, the meta description, and other important SEO tags.

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')

# Get the HTML tags
title = soup.find('title')
h1 = soup.find('h1')
description = soup.find('meta', attrs={'name':'description'})
meta_robots =  soup.find('meta', attrs={'name':'robots'})
canonical = soup.find('link', {'rel': 'canonical'})

# Get the text from the HTML tags
title = title.get_text() if title else ''
h1 = h1.get_text() if h1 else ''
description = description['content'] if description else ''
meta_robots =  meta_robots['content'] if meta_robots else ''
canonical = canonical['href'] if canonical else ''

# Print the tags
print('Title: ', title)
print('h1: ', h1)
print('description: ', description)
print('meta_robots: ', meta_robots)
print('canonical: ', canonical)

Output:

Title:  Crawler Test Site
h1:  Crawler Test Site
description:  Default description XIbwNE7SSUJciq0/Jyty
meta_robots:  
canonical:  

Extracting all the links on a page

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin

url = 'https://crawler-test.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

links = []
for link in soup.find_all('a', href=True):
    full_url = urljoin(url, link['href']) # join domain to path
    links.append(full_url)

# Show 5 links
links[:5]

Output:

['https://crawler-test.com/',
 'https://crawler-test.com/mobile/separate_desktop',
 'https://crawler-test.com/mobile/desktop_with_AMP_as_mobile',
 'https://crawler-test.com/mobile/separate_desktop_with_different_h1',
 'https://crawler-test.com/mobile/separate_desktop_with_different_title']

Python Requests Headers

HTTP requests headers are used to pass additional information with an HTTP request or response, without changing the behaviour of the request. Request headers are simply passed to the request.

How to Add Requests Readers to Python Request

To add a request headers to your GET and POST requests, pass a dictionary to the headers parameter of the get() and post() functions.

import requests 

url = 'http://httpbin.org/headers'

# Add a custom header to a GET request
r = requests.get(url, headers={"Content-Type":"text"})

# Add a custom header to a POST request
r = requests.post(url, headers={'Authorization' : 'Authorization': 'Bearer {access_token}'})

How to Add an Access Token to the Headers of the Request

To add an Access Token to a Python request, pass a dictionary to the params parameter of the get() request.

import requests 

url = 'http://httpbin.org/headers'

access_token = {
    'Authorization': 'Bearer {access_token}'
    }

r = requests.get(url, headers=access_token)
r.json()
{'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Authorization': 'Bearer {access_token}',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615aa0b3-5c680dcb575d50f22e9565eb'}}

Query String Parameters

The query parameters allow you to customize your Python request by passing values to the query string parameters.

Most API requests require to add query parameters to the request. This is the case with the Wikipedia API.

How to Add Parameters to the URL of a Python Request

To add query string parameters to a Python requests, pass a dictionary of parameters to the params argument. Here is how the request url looks like.

import requests
 
url = 'https://en.wikipedia.org/w/api.php'

params = {
        'action': 'query',
        'format': 'json',
        'titles': 'Requests (software)',
        'prop': 'extracts'
    }
 
response = requests.get(url, params=params)

print('Request URL:', response.url)
# Result
Request URL: https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Requests+%28software%29
data = response.json()
 
page = next(iter(data['query']['pages'].values()))
print(page['extract'][:73])

How to Handle Exception Errors in with Requests

To deal with exceptions raised using Python requests, surround your request with the try and except statements.

import requests

url = 'bad url'

try:
    r = requests.get(url)
except Exception as e:
    print(f'There was an error: {e}')
There was an error: Invalid URL 'bad url': No schema supplied. Perhaps you meant http://bad url?

How to Change User-Agent in Your Python Request

To change the user-agent of your Python request, pass a dictionary to the headers parameter of the get() request.

import requests 

url = 'https://www.reddit.com/r/python/top.json?limit=1&t=day'

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}

r = requests.get(url, headers=headers)

How to Add Timeouts to a Request in Python

To add a timeout to a Python request, pass a float value to the timeout parameter of the get() request.

import requests

url = 'https://httpbin.org/delay/3'

try:
    r = requests.get(url, timeout=0.1)
except Exception as e:
    print(e)

r.status_code
HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /basic-auth/user/pass (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb03a7fa290>, 'Connection to httpbin.org timed out. (connect timeout=0.1)'))
401

How to use Proxies with Python Requests

To use proxies with the Python requests, pass a dictionary to the proxies parameter of the get() request.

import requests 

url = 'https://crawler-test.com/'

proxies = {
    'http': '128.199.237.57:8080'
}

r = requests.get(url, proxies=proxies)

You can find free proxies from proxyscrape (may already be blocked). If you start to scale however, you will need a premium proxy service.

Python Requests Sessions

The Python requests Session() object is used to make requests with parameters that persist through all the requests in a single session.

How to use the Requests Session Object (Example)

# Request Session example
import requests
 
url = 'https://httpbin.org/headers'

# Create HTTP Headers
access_token = {
    'Authorization': 'Bearer {access_token}'
    }

with requests.Session() as session:
 
    # Add HTTP Headers to the session
    session.headers.update(access_token)
    
    # Make First Request with session.get()
    r1 = session.get(url)

    # Make Second Request
    r2 = session.get(url)

    # Show HTTP Headers
    print('r1: ', r1.json()['headers']['Authorization'])
    print('r2: ', r2.json()['headers']['Authorization'])

r1:  Bearer {access_token}
r2:  Bearer {access_token}

How to Retry Failed Python Requests

import requests
from requests.adapters import HTTPAdapter, Retry

s = requests.Session()

retries = Retry(total=3,
                backoff_factor=0.1,
                status_forcelist=[500,502,503,504])
                
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))


try:
    r = s.get('https://httpstat.us/500')
except Exception as e:
    print(type(e))
    print(e)

How To Log All Requests From The Python Request Library

Use the logging library to log all requests and set the level keyword of the basicConfig() method to logging.DEBUG.

import requests
import logging
logging.basicConfig(level=logging.DEBUG)

urls = [
    'https://www.crawler-test.com',
    'https://www.crawler-test.com/status_codes/status_500',
    'https://www.crawler-test.com/mobile/separate_desktop'
]

for url in urls:
    r = requests.get(url)

Other HTTP Methods in the Requests Module

On top of GET and POST Requests, the Python library allows to use other popular HTTP functions such as HEAD, PUT, DELETE, PATCH and OPTIONS.

requests.head('https://httpbin.org/get') # Get request HTTP header
requests.put('https://httpbin.org/put', data={'key':'value'})  # Create new resource with payload
requests.delete('https://httpbin.org/delete') # Delete resource
requests.patch('https://httpbin.org/patch', data={'key':'value'}) # Partial modification
requests.options('https://httpbin.org/get') # Specify communication options

Python Requests Best Practices

There are best practices you should follow when using the Python requests library:

  1. Always use timeouts to avoid code being stuck while loading a web page.
  2. Keep a log about each request and response in case your code breaks (timestamp, url, status code, etc.)
  3. Handle potential exceptions by using try/except to define what should happen if code broke.
  4. Perform unit tests using try/except to check if some important portion of the page exists. If a change in the page happen in the future, you should be alerted so that the code don’t break.
  5. Use the head() function when you don’t need to fetch the body of the page in order to improve requests performance.

Python Requests Cheatsheet

This cheatsheet shows you commonly used requests functions in Python along with what each argument does. Note that optional arguments defined by **kwargs are described in the “Common Parameters for all Methods” section.

Requests Functions

GET Function

requests.get(url, params=None, **kwargs)

  • url: The URL to send the GET request to.
  • params: Dictionary or bytes to be sent in the query string of the request URL.
  • **kwargs: Additional optional keyword arguments.

POST Function

requests.post(url, data=None, json=None, **kwargs)

  • url: The URL to send the POST request to.
  • data: Dictionary, bytes, or file-like object to send in the body of the request.
  • json: JSON data to send in the body of the request.
  • **kwargs: Additional optional keyword arguments.

PUT Function

requests.put(url, data=None, **kwargs)

  • url: The URL to send the PUT request to.
  • data: Dictionary, bytes, or file-like object to send in the body of the request.
  • **kwargs: Additional optional keyword arguments.

DELETE Function

requests.delete(url, **kwargs)

  • url: The URL to send the DELETE request to.
  • **kwargs: Additional optional keyword arguments.

HEAD Function

requests.head(url, **kwargs)

  • url: The URL to send the HEAD request to.
  • **kwargs: Additional optional keyword arguments.

OPTIONS Function

requests.options(url, **kwargs)

  • url: The URL to send the OPTIONS request to.
  • **kwargs: Additional optional keyword arguments.

PATCH Function

requests.patch(url, data=None, **kwargs)

  • url: The URL to send the PATCH request to.
  • data: Dictionary, bytes, or file-like object to send in the body of the request.
  • **kwargs: Additional optional keyword arguments.

Common Parameters for all Requests Functions

  • params: Dictionary or bytes to be sent in the query string of the request URL.
  • data: Dictionary, bytes, or file-like object to send in the body of the request.
  • json: JSON data to send in the body of the request.
  • headers: Dictionary of HTTP headers to send with the request.
  • cookies: Dictionary or CookieJar object to send with the request.
  • files: Dictionary of ‘name’: file-like-objects for multipart encoding upload.
  • auth: Authentication credentials as a tuple or an object returned by requests.auth module.
  • timeout: Timeout value in seconds for the request.
  • allow_redirects: Whether to follow redirects or not.
  • proxies: Dictionary mapping protocol to the URL of the proxy.
  • verify: Either a boolean, string, or a Requests SSL/TLS Certificate to verify the request.
  • stream: Whether to enable streaming of the response content.
  • cert: SSL/TLS client certificate file (.pem) or tuple of both the certificate and key files.

Response Methods and Attributes

  • response.close(): Close the connection to the server
  • response.iter_content(): Iterate over the response
  • response.iter_lines(): Iterate over each line in the response
  • response.json(): Return the JSON object of the response. If not JSON, will return an error
  • response.raise_for_status(): Return an HTTPError object if an error occur.
  • response.headers(): Return a dictionary of the response headers.
  • response.text: Return the response content as a string.
  • response.content: Return the response content as bytes.
  • response.url: Return the URL of the response.
  • response.encoding: Return the encoding of the response.
  • response.ok: Return True if the response status code is less than 400
  • response.cookies: Return a dictionary of the response cookies.

Tutorials using Requests

Interesting work from the community

Other Web Scraping Tutorials

Facts about Python Requests

Python Requests AuthorKenneth Reitz
Python Requests LanguagePython
Python Requests FunctionsGET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
Python Requests Release2011-02-14

Request Properties

  • apparent_encoding: Return the apparent encoding
  • content: Return the content of the response, in bytes
  • cookies: Show the object containing the cookies returned by the server
  • elapsed: Time elapsed between when request is sent VS when response is returned
  • encoding: Show encoding used to decode r.text
  • headers: Return a dictionary of response headers
  • history: Return a list of response objects containing the request history
  • is_permanent_redirect: Show if URL is permanently redirected
  • is_redirect: Show if URL is redirected
  • links: Return the links HTTP header
  • next: Return an object for the next request in a redirection
  • ok: Show if status code is less than 400
  • reason: Textual explanation of the status code
  • request: Show the request object of the request sent for a given response
  • status_code: Show the status code returned by the server
  • text: Return the content of the response, in unicode
  • url: Show the URL of the response

What’s Next

Conclusion

If you are looking for an alternative to the requests library, you may be interested in the requests-HTML library that provides some built-in HTML parsing options.

This library is not only useful for web scraping, but also for web development and any other endeavour that uses APIs.

We now conclude the introduction on the Python Requests library.

5/5 - (5 votes)