What are HTTP Requests

In this article, we will learn what HTTP Requests are, and how you can leverage them in SEO, web scraping and building web applications.

Understanding how HTTP requests work is essential for building and maintaining websites and web applications.

Navigation Show

What is an HTTP Request

An HTTP request is a way for web clients (e.g. web browsers) to communicate with web servers over the internet.

When a client sends an HTTP request to a server, it is asking for some kind of resource (such as a webpage, an image, or a file).

The server then sends a response that contains the requested resource. If the resource is not available, it will send an error message.

Basics of Internet Communication

Whenever you load content on a website with your browser, you (the client) communicates with the server using TCP and HTTP.

What is HTTP

HTTP is used whenever accessing a website to load and interact with its content.

HTTP stands for Hypertext Transfer Protocol and it is used to structure the requests and responses exchanged on the Internet.

What is TCP

The transfer of resources on the internet occurs using Transmission Control Protocols (TCP).

Transmission Control Protocols, also known as TCP, is the communication standards that allows the exchange messages over a network.

What TCP does is:

Organizes data in a ways that can be transmitted between a server and a client
Guarantees the integrity of the data being communicated
Manages Internet Connections
Splits large amounts of data into smaller packets

How Data is Transferred Through the Internet

When you type an address (e.g. https://www.jcchouinard.com) in your browser, you are establishing a Transmission Control Protocols connection with the server that responds to that URL.

When the connection is established, the TCP will ensure the connection will remains open until communication begins.

Then, the client (your browser) sends an HTTP GET request to the server to retrieve the document that the server should display. The server returns and HTTP Response.

Once the response is returned, the server closes the TCP connection.

Structure of HTTP Requests

HTTP requests are made up of multiple parts:

HTTP Request method: method used in the request (GET, POST, …) .
Request URL: URL of the requested resource.
HTTP version: Version of the HTTP protocol (HTTP/1.1, HTTP/2).
HTTP Request headers: Additional info about the request (user-agent, accept headers, etc.).
HTTP Message body: Data sent within the request

First line of The HTTP Request

An HTTP request always start with a line that tells the method that you’re using, the request URL and the version of the HTTP protocol.

For example, an HTTP GET request looks like this:

GET /request-url HTTP/1.1

A request method (e.g. GET, POST)
A Request URL (/request-url in the line)
The version of the HTTP protocol (HTTP/1.1)

HTTP Request Headers

In the HTTP request, request headers come after the request lines and add additional information on the request.

Here’s what an HTTP request header looks like:

Host: example.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 13_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36
Accept-Language: en-US
Cache-Control: max-age=0
Accept-Encoding: gzip, deflate
Connection: keep-alive

You can get an explanation of what each of those HTTP header parameters do on the Mozilla website.

Structure of a HTTP Responses

When you make a request, the server returns response.

The structure of the response is similar to the structure of the request.

Request	Response
Request line	Status Line
Request Headers	Response Headers
Message Body	Message Body

Response Status Line

The first line of the response header is the status line, which shows the version of HTTP protocol and the HTTP Status code.

If the resource loads successfully, you’ll see an HTTP 2XX status code (success):

HTTP/1.1 200 OK

If there is an error, the HTTP response header will show error codes (4XX,5XX):

HTTP/1.1 400 OK

Response Headers

A response header is an HTTP header used in an HTTP response that provides information about the content of the message returned. Response headers like Date, Last-Modified or Vary give context to the response.

Example response headers returned by the server.

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Mon, 18 Jul 2016 16:06:00 GMT
Last-Modified: Mon, 18 Jul 2016 02:36:04 GMT
Server: Apache

Message Body

The HTTP message body contains the data that you are sending or receiving.

In the example of a GET request, here is an example where the message body is “Hello world!”.

HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 12
Connection: close
Content-Type: text/html

Hello world!

Types of HTTP Requests Methods

This section explains a commonly used methods in HTTP.

Here is a list of some of the HTTP requests methods available.

get: Request data
post: Publish data
put: Replace data
head: Similar to get request but without the body
delete: Delete data
patch: Make Partial changes to the data
connect: Establish a tunnel to the server identified by the target resource
trace: Return the request that was received in the response
options:: Describe communication options for the target

GET Method

The GET method is the method used by your browser to request for information from a server.

Last time you used a browser to access a webpage, you have used the GET method without even realizing it!

The browser sends a request for a specific resource.
The resource is identified by a URI (Uniform Resource Identifier).
The server returns a response with the requested information along with the 200 (OK) status code.

GET methods are used across the Internet as the primary way to fetch information.

POST Method

GET requests used in requesting a web page from a server is only one of the possible HTTP methods a client can call. For example, while GET requests are used to retrieve resources (e.g. loading a page) from a web server, POST requests can be used to create new resources (e.g. posting a comment on social media).

The POST method is used to send data to a server. You can used this method for things like filling a form or sharing content on social medias for example.

POST requests are not cacheable, which means that if you are using a POST request on your website, crawlers such as Googlebot will not be able to cache it.

Put Method

Based on the RFC, PUT method is the HTTP request that you can use to replace or create a state of a target resource with the representation enclosed in the request message content.

Euh, What?

Let’s make it clearer.

The PUT request is the HTTP request used to update or modify something that already exists on a web server, like a file.

PUT Status Codes

With the PUT request, you will receive the following status codes when:

201: PUT creates a new representation of the target resource
200 or 204: PUT modifies an existing representation,
409 or 415: PUT is inconsistent with the target resource

A representation here is information that is intended to reflect a past, current, or desired state of a given resource.

Head Method

The HEAD method acts like GET, with the difference that the server doesn’t return body (content) in the response. This is to minimize the amount of data that is generated and transferred.

It is used to get the metadata without transferring the actual data.

The HEAD method can be useful for:

testing hypertext links
finding recent changes

The HEAD method is often used by web crawlers to improve efficiency.

The HEAD response may have minor inconsistencies with the GET response. Still better than generating content that you don’t need just for a HEAD request.

You can gain further efficiency with the HEAD request because it is cacheable. With caching, your previous request can be used to satisfy your next HEAD requests. That is, unless the Cache-Control header field says that you can’t. The HEAD response may affect previously cached GET responses.

Delete Method

The DELETE method is the HTTP request is the method used to ask a server to remove something from its system.

You can use the DELETE request to delete stuff from the Internet, and the server will return one of these status codes (source: rfc):

a 202 (Accepted) status code if the action will likely succeed but has not yet been enacted,
a 204 (No Content) status code if the action has been enacted and no further information is to be supplied, or
a 200 (OK) status code if the action has been enacted and the response message includes a representation describing the status.

Patch Method

The PATCH method is the HTTP request that you can use to make partial changes on an existing resource. You can use this method to make small changes to a document instead of replacing the entire document.

Connect Method

You can use the CONNECT method to create a network connection with a document. Engineers use this to create secure tunnels (e.g. VPN).

Trace Method

If you want to make a diagnostic on the path taken by your request in the network, you can use the TRACE method. Useful for debugging.

Options Method

Use the OPTIONS method to get the HTTP methods and options available for a resource. Use this before making a request to ask the server what you can do with a resource.

HTTP requests in Web Scraping

In web scraping, to extract data from a website, use the Python HTTP requests library, identify the URL that you want to extract data from and send an HTTP GET request to retrieve the content.

# Making an HTTP Request
import requests
 
url = 'https://crawler-test.com/'
response = requests.get(url)

You can also use an alternative HTTP requests library in Python such as urllib or httplib.

Then, parse the response using a library such as BeautifulSoup.

# Parse the HTML
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')
soup.find('title')

After, store the data the extracted data is a structured format such as CSV or JSON.

To avoid being detected as a scraper, add proxies to your HTTP headers and handle cookies.

HTTP Requests Python Libraries

Library	Key Features	Use Cases
python requests	– Simple API – Handle most requests methods (GET, POST, PUT, DELETE, …) – Handles cookies and redirects – Supports authentication and SSL verification	– Retrieving data from APIs – Scraping data from websites – Testing web applications – Any other general-purpose HTTP requests
python urllib	– Handle most requests methods (GET, POST, PUT, DELETE, …) – Supports cookies and basic authentication – Handles redirects	– Retrieving data from APIs – Scraping data from websites – Testing web applications
python httplib	– Low-level library – Fine-grained control over HTTP requests and responses – Supports multiple connections	– Advanced HTTP requests – Network programming – Testing web applications
python aiohttp	– Asynchronous library – Handle most requests methods (GET, POST, PUT, DELETE, …) – Supports web sockets – Supports SSL verification – Provides client and server functionality for asyncio	– High-performance web scraping – Real-time web applications – Any other use case that requires asynchronous HTTP requests
python http.client	– Low-level control over HTTP requests and responses – Supports multiple connections	– Advanced HTTP requests – Network programming – Testing web applications
python requests-html	– Handle most requests methods (GET, POST, PUT, DELETE, …) – Allows JavaScript execution	– JavaScript Rendered Website Scraping – Requests requiring persistent session cookie

Conclusion

We have learned everything that you need to know about HTTP requests.

We have learned about:

HTTP and TCP
Structure of HTTP Requests and Responses
Types of HTTP methods (GET, POST, PUT, DELETE, PATCH, CONNECT, TRACE, and OPTIONS)

Let’s learn how to make HTTP requests in Python.

Enjoyed This Post?

Jean-Christophe Chouinard

SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.