Web Scraping with CSS Selectors (with Python Examples)

In this tutorial, you will learn what CSS Locators are and how to leverage CSS Selectors in web scraping.

CSS Selectors allow you to locate exact elements within an HTML document. It is also supported by most web scraping tools, making it a super useful tool for web scraping.

What is a CSS Selector

CSS Selectors use the Cascading Style Sheets rules to locate elements in an HTML document.


Subscribe to my Newsletter


For example, in CSS, you could use this rule to apply a color to all links on a page that have a class named “myclass”.

a.myclass {
	color: blue;
}

Similarly, in Web scraping, you can use the same CSS selector to display all the links with the same class:

a.myclass

Why CSS Selectors are Useful in Web Scraping

CSS Selectors are very useful in web scraping. CSS Selectors allow you to:

  • locate the element you want to extract from a webpage,
  • identify and extract data from HTML documents quickly.
  • automate the scraping of webpages.

Simplest Way to Find CSS Selectors in Chrome

Chrome DevTools as an incredible feature that allows you to find the CSS selectors of any DOM element without any prior knowledge.

Open Chrome DevTool with Command + Shift + I, or by using right-click > inspect.

Then right click on any element in the DOM and select Copy > Copy Selector.

Basics of CSS Selectors

CSS selectors are patterns used to describe the location of element(s) within an HTML document.

They consist of one or more elements separated by a white space.

The last element on the right of the selector expression is the element that you want to select

For example, the CSS Selector below locates the h1, found within the HTML body element:

html > body > h1

Basic Structure of the CSS Selectors

The basic structure of an CSS Selectors is similar to the structure used in XPath, or the structure to navigate a URL where the > symbol replaces the slash symbol (/).

xpath: /html/body/p
CSS selector: html > body > p

The CSS locator is represented by a series of steps. Each step is separated by the greater-than (>) symbol, which moves forward one generation.

Types of CSS Selectors (Filters)

There are several types of CSS selectors that can be used to target HTML elements on a web page.

  • Tag Selector
  • Class Selector
  • ID Selector
  • Attribute Selector
  • Descendant Selector
  • Child Selector
  • Pseudo-Class Selector

Basic CSS Selectors Cheatsheet

SelectorDescriptionExampleExplanation
Tag SelectorSelects elements based on their tag name.pSelects all <p> elements.
Class SelectorSelects elements based on their class name..exampleSelects all elements with the class name “example”.
ID SelectorSelects an element based on its ID.#exampleSelects the element with the ID “example”.
Attribute SelectorSelects elements based on their attribute and value.[type="text"]Selects all elements with the attribute “type” and value “text”.
Descendant SelectorSelects elements that are descendants of another element.div pSelects all <p> elements that are descendants of a <div> element.
Child SelectorSelects elements that are direct children of another element.ul > liSelects all <li> elements that are direct children of a <ul> element.
Pseudo-Class SelectorSelects elements based on their state or position in the document.a:hoverSelects all <a> elements when the mouse is hovering over them.

Tag Selector

Tag selectors can be used to target specific HTML elements.

For example, you could use the title, h1, p or a tag selectors in order to select each element.

<html>
    <head>
        <title>Title of your web page</title>
    </head>
    <body>
        <h1>Heading of the page</h1>
        <p id="first-paragraph" class="paragraph">Paragraph of text</p>
        <p class="paragraph">Paragraph of text 2</p>
        <div><p class="paragraph">Nested paragraph</p></div>
        <a href="/a-link">hyperlink</a>
    </body>
</html>

To select the h1, here is the selector that you would use.

h1
<h1>Heading of the page</h1>

Class Selector

This example shows how to use the dot notation to select the elements with a certain class

This expression selects all the <p> HTML tags that have a class attribute with “paragraph” as its value.

p.paragraph
<p id="first-paragraph" class="paragraph">Paragraph of text</p>
 <p class="paragraph">Paragraph of text 2</p>
 <p class="paragraph">Nested paragraph</p>

ID Selector

An ID Selector uses the hashtag to target an element with its unique id.

p#first-paragraph
<p id="first-paragraph" class="paragraph">Paragraph of text</p>

Attribute Selector

An attribute Selector selects elements based on their attribute and value.

It uses the syntax:

tag-name[attribute="attribute-value"]

Example:

a[href="/a-link"]
 <a href="/a-link">hyperlink</a>

Descendant Selector

The descendant selector selects all descendants elements of another element.

This is done using the parent and the descendant tags with a space instead of a >.

body p
<p id="first-paragraph" class="paragraph">Paragraph of text</p>
 <p class="paragraph">Paragraph of text 2</p>
 <p class="paragraph">Nested paragraph</p>

Child Selector

The child selector selects elements that are direct children of another element.

body > p

The difference here is the any element that are not direct, e.g. nested elements like the p inside the div will not be selected.

<p id="first-paragraph" class="paragraph">Paragraph of text</p>
<p class="paragraph">Paragraph of text 2</p>

Pseudo-Class Selector

The pseudo-class selector uses the colon : to select elements based on their state or position in the document.

For example, you can define which sibling that you want to locate using the :nth-of-type(N).

The CSS expression below shows how to select the first div of the body element.

html > body > div:nth-of-type(1)

Unlike Python that uses zero-based indexing, the index in CSS Selectors starts at 1.

There are many pseudo-class selectors, some of which are described in this table.

Pseudo-class SelectorDescription
:hoverSelects an element when the mouse pointer hovers over it
:activeSelects an element when it is being activated (e.g. clicked on)
:visitedSelects a link that has been visited by the user
:focusSelects an element when it has focus (e.g. when a form field is selected)
:first-childSelects the first child element of its parent element
:last-childSelects the last child element of its parent element
:nth-child(n)Selects the nth child element of its parent element (where n is a number)
:nth-of-type(n)Selects the nth element of its type (where n is a number)
:last-of-typeSelects the last occurrence of an element within its container (e.g. h2:last-of-type)

Using CSS Selectors with Python

You can use CSS selectors when Web Scraping with BeautifulSoup or when Web Scraping with Scrapy.

Scraping CSS Selectors with BeautifulSoup

BeautifulSoup is not really allowing CSS selectors. To extract using CSS tag, you would need to do something like:

from bs4 import BeautifulSoup
import requests

# Fetch the HTML
url = 'https://crawler-test.com/'
response = requests.get(url)

# Parse the HTML
soup = BeautifulSoup(response.text, 'html.parser')

# Define CSS
soup.find_all("div", {"class":"panel-header"})

Scraping CSS Selectors with Scrapy

Scrapy allows you to scrape content using CSS selectors with the css() method from the Selector class.

from scrapy import Selector
html = '''<html>
    <head>
        <title>Title of your web page</title>
    </head>
    <body>
        <h1>Heading of the page</h1>
        <p id="first-paragraph" class="paragraph">Paragraph of text</p>
        <p class="paragraph">Paragraph of text 2</p>
        <div><p class="paragraph">Nested paragraph</p></div>
        <a href="/a-link">hyperlink</a>
    </body>
</html>'''

# Instantiate Selector
sel = Selector(text=html)

# define Selector class
css_selector = 'body > p#first-paragraph'

sel.css(css_selector).extract()
['<p id="first-paragraph" class="paragraph">Paragraph of text</p>']

Using CSS Selectors with JavaScript

Whatever way you are executing JavaScript (e.g. Chrome DevTools Console, NodeJS, etc.), you can use JavaScript the querySelector or querySelectorAll commands to locate HTML elements with their CSS selectors.

Here is an example using querySelectorAll to find all the links on a page.

var links = document.querySelectorAll('a');
var anchorText = [];

links.forEach(link => {
  anchorText.push(link.textContent);
});

console.log(anchorText);

XPath to CSS Selector Conversion

EquivalencyXPath NotationCSS Selector
Select by element type//divdiv
Select by class name//div[@class=”example”]div.example
Select by ID//*[@id=”example”]#example
Select by attribute//input[@name=”example”]input[name=”example”]
Select by attribute value containing//input[contains(@class, “example”)]input[class*=”example”]
Select by attribute value starting with//input[starts-with(@id, “example”)]input[id^=”example”]
Select by attribute value ending with//a[ends-with(@href, “example”)]a[href$=”example”]
Select by sibling//div/following-sibling::pdiv + p
Select by descendant//div//pdiv p
Select by first child//div/p[1]div > p:first-child
Select by last child//div/p[last()]div > p:last-child

Difference Between Xpath and CSS Selectors

The difference between XPath and CSS selectors is that with the XPath we can move forward and backward while a CSS selector can only move forward while XPath can be a bit more complex.

Articles Related to Web Scraping

Enjoyed This Post?