Python Requests Library (Examples and Video)

The Python requests library is used to make HTTP requests in Python.

Python Requests Example

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.text

In this tutorial, you will learn how to use the Python requests module with examples:

  • Understand the structure of a request
  • Make GET and POST requests
  • Read and extract elements of the HTML of a web page
  • Improve your requests
Contenus masquer

Let’s learn how to use Python Requests.


Subscribe to my Newsletter


How to Use Python Requests

Follow these steps to use the Python requests module.

  1. Install the Python Requests Package

    $ pip install requests

  2. Import the Requests Module

    import requests

  3. Make a Request using the GET method

    Use the GET method and store the response in a variable.
    r = requests.get(url)

  4. Read the response using request’s attributes and methods

    You can interact with the Python request object using its attributes (e.g. r.status_code) and methods (e.g. r.json()).

Download and Install Python Requests

Use pip to install the latest version of python requests.

$ pip install requests

For this guide, you will need to install Python and install the following packages.

$ pip install beautifulsoup4
$ pip install urllib

Import the Request Module

To import the requests library in Python use the import keyword.

import requests

Python Requests Methods

Below are listed the Python requests methods:

  • get: Sends a GET requests to a given URL. E.g. get(url, parameters, arguments)
  • post: Sends a POST request to publish specified data to a given URL. E.g. post(url, data, json, arguments)
  • put: Sends a PUT request to replace data at a given URL. E.g. put(url, data, arguments)
  • patch: Sends a PATCH request to make partial changes to the data of a given URL. E.g patch(url, data, arguments)
  • delete: Sends a DELETE request to delete data from a given URL. E.g delete(url, arguments)
  • head: Sends a HEAD request to a given URL. This is similar to a GET request, but without the body. E.g. head(url, arguments)
  • options: Specify communication options for a given URL. E.g. options(url)
  • Request: Creates request object by specifying the method to choose.

We will now view some examples of requests methods in Python.

Python Get Requests

The python requests get() method sends GET requests to a web server for a given URL, set of parameters and arguments. The get() method follows this pattern:

get(url, parameters, arguments)

How to Send Get Requests in Python

To send GET requests in Python, use the get() method with the URL that you want to retrieve information from.

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

print('URL: ', response.url)
print('Status code: ', response.status_code)
print('HTTP header: ', response.headers)

Output:

URL:  https://crawler-test.com/
Status code:  200
HTTP header:  {'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Sun, 03 Oct 2021 23:41:59 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8098', 'Connection': 'keep-alive'}

Post Requests

The python requests post() method sends POST requests to a web server to publish specified data to a given URL. The post() method follows this pattern:

post(url, data, json, arguments)

How to Send Post Requests in Python

To send POST requests in Python, use the post() method. Add the URL and a dictionary representation of the data to be published to the data parameter.

import requests

url = 'https://httpbin.org/post'

payload = {
    'name':'Jean-Christophe',
    'last_name':'Chouinard',
    'website':'https://www.jcchouinard.com/'
    }

response = requests.post(url, data = payload)

response.json()

Output:

{'args': {},
 'data': '',
 'files': {},
 'form': {'last_name': 'Chouinard',
  'name': 'Jean-Christophe',
  'website': 'https://www.jcchouinard.com/'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Content-Length': '85',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615a4271-417e9fff3c75f47f3af9fde2'},
 'json': None,
 'origin': '149.167.130.162',
 'url': 'https://httpbin.org/post'}

Python Response Object’s Methods and Attributes

The python.response object contains the server’s response to the HTTP request.

You can investigate the details of the Response object by using help().

import requests

url = 'https://crawler-test.com/'
response = requests.get(url)

help(response)

In this tutorial we will look at the following:

  • text, data descriptor : Content of the response, in unicode.
  • content, data descriptor : Content of the response, in bytes.
  • url, attribute : URL of the request
  • status_code, attribute : Status code returned by the server
  • headers, attribute : HTTP headers returned by the server
  • history, attribute : list of response objects holding the history of request
  • links, attribute : Returns the parsed header links of the response, if any.
  • json, method : Returns the json-encoded content of a response, if any.

Access the Response Methods and Attributes

The response from the request is an object in which you can access its methods and attributes.

You can access the attributes using the object.attribute notation and the methods using the object.method() notation.

import requests

url = 'http://archive.org/wayback/available?url=jcchouinard.com'
response = requests.get(url)

response.text # access response data atributes and descriptors
response.json() # access response methods
{'url': 'jcchouinard.com',
 'archived_snapshots': {'closest': {'status': '200',
   'available': True,
   'url': 'http://web.archive.org/web/20210930032915/https://www.jcchouinard.com/',
   'timestamp': '20210930032915'}}}

Process the Python Response

How to Access the JSON of Python Requests

In Python requests, the response.json() method allows to access the JSON object of the response. If the result of the request is not written in a JSON format, the JSON decoder will return the requests.exceptions.JSONDecodeError exception.

How to Show the Status Code of a Python Request

To show the status code returned by a Python get() request use the status_code attribute of the response Object.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.status_code
# 200

How to Get the HTML of the Page with Python Requests

To get the HTML of a web page using python requests, make a GET request to a given URL and the text attribute of the response Object to get the HTML as Unicode and the content attribute to get HTML as bytes.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)

r.text # get content as a string
r.content # get content as bytes

How to Show the HTTP header of a GET Request

To show the HTTP headers used in a Python GET request, use the headers attribute of the response Object.

import requests

url = 'https://crawler-test.com/'
r = requests.get(url)
r.headers
{'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Tue, 05 Oct 2021 04:23:27 GMT', 'Server': 'nginx/1.10.3', 'Vary': 'Accept-Encoding', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '8099', 'Connection': 'keep-alive'}

How to Show HTTP Redirects with Python Requests

To show HTTP redirects that happened during a Python get() request, use the history attribute of the response Object. Create a for loop on the response.history and get the .url and .status_code attributes of each element of the history.

import requests

url = 'https://crawler-test.com/redirects/redirect_chain_allowed'
r = requests.get(url)

for redirect in r.history:
    print(redirect.url, redirect.status_code)
print(r.url, r.status_code)
https://crawler-test.com/redirects/redirect_chain_allowed 301
https://crawler-test.com/redirects/redirect_chain_disallowed 301
https://crawler-test.com/redirects/redirect_target 200

Parse the HTML with Request and BeautifulSoup

BeautifulSoup is a Python library that allow you to parse HTML and XML to pull data from them. BeautifulSoup can be used to parse the HTML returned in the Python Response object.

How the Python Response Object Returns the HTML

The Python Response object, returns the HTML from the URL passed in the get() request as Unicode or Bytes.

The textual format of the returned HTML makes it hard to extract information from it.

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

r.text[:500]
'<!DOCTYPE html>\n<html>\n  <head>\n    <title>Crawler Test Site</title>\n    \n      <meta content="en" HTTP-EQUIV="content-language"/>\n         \n    <link type="text/css" href="/css/app.css" rel="stylesheet"/>\n    <link type="image/x-icon" href="/favicon.ico?r=1.6" rel="icon"/>\n    <script type="text/javascript" src="/bower_components/jquery/jquery.min.js"></script>\n    \n      <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>\n    \n\n    \n        <link rel="alternate" media'

How to Parse HTML with BeautifulSoup

To parse the HTML returned from a get() request, pass the response.text attribute to the BeautifulSoup class of the bs4 library. Use the ‘html.parser’ argument to parse the HTML.

# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')
soup

This will return a soup object that can be used to extract data from it.

<!DOCTYPE html>

<html>
<head>
<title>Crawler Test Site</title>
<meta content="en" http-equiv="content-language"/>
<link href="/css/app.css" rel="stylesheet" type="text/css"/>
...
</html>

The output is easier to interpret now that it was parsed with BeautifulSoup.

You can extract tag using the find() or find_all() methods.

soup.find('title')

Output:

<title>Crawler Test Site</title>
soup.find_all('meta')

Output:

[<meta content="en" http-equiv="content-language"/>,
 <meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>,
 <meta content="nositelinkssearchbox" name="google"/>,
 <meta content="0H-EBys8zSFUxmeV9xynoMCMePTzkUEL_lXrm9C4a8A" name="google-site-verification"/>]

Or, even select the attributes of the tag.

soup.find('meta', attrs={'name':'description'})

Output:

<meta content="Default description XIbwNE7SSUJciq0/Jyty" name="description"/>

How to Get the Main SEO tags from a Webpage

To extract the main SEO tags from a web page, use requests along with the BeautifulSoup parsing library. The find() method of the soup object will allow you to extract HTML tags such as the H1, the title, the meta description, and other important SEO tags.

from bs4 import BeautifulSoup
import requests

# Make the request
url = 'https://crawler-test.com/'
r = requests.get(url)

# Parse the HTML
soup = BeautifulSoup(r.text, 'html.parser')

# Get the HTML tags
title = soup.find('title')
h1 = soup.find('h1')
description = soup.find('meta', attrs={'name':'description'})
meta_robots =  soup.find('meta', attrs={'name':'robots'})
canonical = soup.find('link', {'rel': 'canonical'})

# Get the text from the HTML tags
title = title.get_text() if title else ''
h1 = h1.get_text() if h1 else ''
description = description['content'] if description else ''
meta_robots =  meta_robots['content'] if meta_robots else ''
canonical = canonical['href'] if canonical else ''

# Print the tags
print('Title: ', title)
print('h1: ', h1)
print('description: ', description)
print('meta_robots: ', meta_robots)
print('canonical: ', canonical)

Output:

Title:  Crawler Test Site
h1:  Crawler Test Site
description:  Default description XIbwNE7SSUJciq0/Jyty
meta_robots:  
canonical:  

Extracting all the links on a page

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin

url = 'https://crawler-test.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

links = []
for link in soup.find_all('a', href=True):
    full_url = urljoin(url, link['href']) # join domain to path
    links.append(full_url)

# Show 5 links
links[:5]

Output:

['https://crawler-test.com/',
 'https://crawler-test.com/mobile/separate_desktop',
 'https://crawler-test.com/mobile/desktop_with_AMP_as_mobile',
 'https://crawler-test.com/mobile/separate_desktop_with_different_h1',
 'https://crawler-test.com/mobile/separate_desktop_with_different_title']

Python Requests Headers

HTTP requests headers are used to pass additional information with an HTTP request or response, without changing the behaviour of the request. Request headers are simply passed to the request.

How to Add Requests Readers to Python Request

To add a request headers to your GET and POST requests, pass a dictionary to the headers parameter of the get() and post() methods.

import requests 

url = 'http://httpbin.org/headers'

# Add a custom header to a GET request
r = requests.get(url, headers={"Content-Type":"text"})

# Add a custom header to a POST request
r = requests.post(url, headers={'Authorization' : 'Authorization': 'Bearer {access_token}'})

How to Add an Access Token to the Headers of the Request

To add an Access Token to a Python request, pass a dictionary to the params parameter of the get() request.

import requests 

url = 'http://httpbin.org/headers'

access_token = {
    'Authorization': 'Bearer {access_token}'
    }

r = requests.get(url, headers=access_token)
r.json()
{'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Authorization': 'Bearer {access_token}',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.24.0',
  'X-Amzn-Trace-Id': 'Root=1-615aa0b3-5c680dcb575d50f22e9565eb'}}

Query String Parameters

The query parameters allow you to customize your Python request by passing values to the query string parameters.

Most API requests require to add query parameters to the request. This is the case with the Wikipedia API.

How to Add Parameters to the URL of a Python Request

To add query string parameters to a Python requests, pass a dictionary of parameters to the params argument. Here is how the request url looks like.

import requests
 
url = 'https://en.wikipedia.org/w/api.php'

params = {
        'action': 'query',
        'format': 'json',
        'titles': 'Requests (software)',
        'prop': 'extracts'
    }
 
response = requests.get(url, params=params)

print('Request URL:', response.url)
# Result
Request URL: https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Requests+%28software%29
data = response.json()
 
page = next(iter(data['query']['pages'].values()))
print(page['extract'][:73])

How to Handle Exception Errors in with Requests

To deal with exceptions raised using Python requests, surround your request with the try and except statements.

import requests

url = 'bad url'

try:
    r = requests.get(url)
except Exception as e:
    print(f'There was an error: {e}')
There was an error: Invalid URL 'bad url': No schema supplied. Perhaps you meant http://bad url?

How to Change User-Agent in Your Python Request

To change the user-agent of your Python request, pass a dictionary to the headers parameter of the get() request.

import requests 

url = 'https://www.reddit.com/r/python/top.json?limit=1&t=day'

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}

r = requests.get(url, headers=headers)

How to Add Timeouts to a Request in Python

To add a timeout to a Python request, pass a float value to the timeout parameter of the get() request.

import requests

url = 'http://httpbin.org/basic-auth/user/pass'

try:
    r = requests.get(url, timeout=0.1)
except Exception as e:
    print(e)

r.status_code
HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /basic-auth/user/pass (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fb03a7fa290>, 'Connection to httpbin.org timed out. (connect timeout=0.1)'))
401

How to use Proxies with Python Requests

To use proxies with the Python requests, pass a dictionary to the proxies parameter of the get() request.

import requests 

url = 'https://crawler-test.com/'

proxies = {
    'http': '128.199.237.57:8080'
}

r = requests.get(url, proxies=proxies)

You can find free proxies from proxyscrape (may already be blocked). If you start to scale however, you will need a premium proxy service.

Requests Sessions

The requests Session() object is used to make requests with parameters that persist through all the requests in a single session.

How to use the Requests Session Object

# Request Session
import requests
 
url = 'https://httpbin.org/headers'

# Create HTTP Headers
access_token = {
    'Authorization': 'Bearer {access_token}'
    }

with requests.Session() as session:
 
    # Add HTTP Headers to the session
    session.headers.update(access_token)
    
    # Make First Request
    r1 = session.get(url)

    # Make Second Request
    r2 = session.get(url)

    # Show HTTP Headers
    print('r1: ', r1.json()['headers']['Authorization'])
    print('r2: ', r2.json()['headers']['Authorization'])

r1:  Bearer {access_token}
r2:  Bearer {access_token}

How to Retry Failed Python Requests

import requests
from requests.adapters import HTTPAdapter, Retry

s = requests.Session()

retries = Retry(total=3,
                backoff_factor=0.1,
                status_forcelist=[500,502,503,504])
                
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))


try:
    r = s.get('https://httpstat.us/500')
except Exception as e:
    print(type(e))
    print(e)

Other HTTP Methods

On top of GET and POST Requests, the Python library allows to use other popular HTTP methods such as HEAD, PUT, DELETE, PATCH and OPTIONS.

requests.head('https://httpbin.org/get') # Get request HTTP header
requests.put('https://httpbin.org/put', data={'key':'value'})  # Create new resource with payload
requests.delete('https://httpbin.org/delete') # Delete resource
requests.patch('https://httpbin.org/patch', data={'key':'value'}) # Partial modification
requests.options('https://httpbin.org/get') # Specify communication options

What is the Python Requests Library?

The python requests library, also known as python requests, is a HTTP library that allows users to send HTTP requests using Python. Its tagline “Python HTTP for Humans” represents well the simplicity of the package.

Tutorials using Requests

Interesting work from the community

Other Web Scraping Tutorials

Facts about Python Requests

Python Requests AuthorKenneth Reitz
Python Requests LanguagePython
Python Requests MethodsGET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
Python Requests Release2011-02-14

Request Methods

  • close(): Close the connection to the server
  • iter_content(): Iterate over the response
  • iter_lines(): Iterate over each line in the response
  • json(): Return the JSON object of the response. If not JSON, will return an error
  • raise_for_status(): Return an HTTPError object if an error occur.

Request Properties

  • apparent_encoding: Return the apparent encoding
  • content: Return the content of the response, in bytes
  • cookies: Show the object containing the cookies returned by the server
  • elapsed: Time elapsed between when request is sent VS when response is returned
  • encoding: Show encoding used to decode r.text
  • headers: Return a dictionary of response headers
  • history: Return a list of response objects containing the request history
  • is_permanent_redirect: Show if URL is permanently redirected
  • is_redirect: Show if URL is redirected
  • links: Return the links HTTP header
  • next: Return an object for the next request in a redirection
  • ok: Show if status code is less than 400
  • reason: Textual explanation of the status code
  • request: Show the request object of the request sent for a given response
  • status_code: Show the status code returned by the server
  • text: Return the content of the response, in unicode
  • url: Show the URL of the response

What’s Next

Conclusion

If you are looking for an alternative to the requests library, you may be interested in the requests-HTML library that provides some built-in HTML parsing options.

This library is not only useful for web scraping, but also for web development and any other endeavour that uses APIs.

We now conclude the introduction on the Python Requests library.

5/5 - (5 votes)