Web Scraping with CSS Selectors (with Python Examples)

In this tutorial, you will learn what CSS Locators are and how to leverage CSS Selectors in web scraping.

CSS Selectors allow you to locate exact elements within an HTML document. It is also supported by most web scraping tools, making it a super useful tool for web scraping.

What is a CSS Selector

CSS Selectors use the Cascading Style Sheets rules to locate elements in an HTML document.

Join the Newsletter

    For example, in CSS, you could use this rule to apply a color to all links on a page that have a class named “myclass”.

    a.myclass {
    	color: blue;
    }
    

    Similarly, in Web scraping, you can use the same CSS selector to display all the links with the same class:

    a.myclass
    

    Why CSS Selectors are Useful in Web Scraping

    CSS Selectors are very useful in web scraping. CSS Selectors allow you to:

    • locate the element you want to extract from a webpage,
    • identify and extract data from HTML documents quickly.
    • automate the scraping of webpages.

    Simplest Way to Find CSS Selectors in Chrome

    Chrome DevTools as an incredible feature that allows you to find the CSS selectors of any DOM element without any prior knowledge.

    Open Chrome DevTool with Command + Shift + I, or by using right-click > inspect.

    Then right click on any element in the DOM and select Copy > Copy Selector.

    Basics of CSS Selectors

    CSS selectors are patterns used to describe the location of element(s) within an HTML document.

    They consist of one or more elements separated by a white space.

    The last element on the right of the selector expression is the element that you want to select

    For example, the CSS Selector below locates the h1, found within the HTML body element:

    html > body > h1
    

    Basic Structure of the CSS Selectors

    The basic structure of an CSS Selectors is similar to the structure used in XPath, or the structure to navigate a URL where the > symbol replaces the slash symbol (/).

    xpath: /html/body/p
    CSS selector: html > body > p
    

    The CSS locator is represented by a series of steps. Each step is separated by the greater-than (>) symbol, which moves forward one generation.

    Types of CSS Selectors (Filters)

    There are several types of CSS selectors that can be used to target HTML elements on a web page.

    • Tag Selector
    • Class Selector
    • ID Selector
    • Attribute Selector
    • Descendant Selector
    • Child Selector
    • Pseudo-Class Selector

    Basic CSS Selectors Cheatsheet

    SelectorDescriptionExampleExplanation
    Tag SelectorSelects elements based on their tag name.pSelects all <p> elements.
    Class SelectorSelects elements based on their class name..exampleSelects all elements with the class name “example”.
    ID SelectorSelects an element based on its ID.#exampleSelects the element with the ID “example”.
    Attribute SelectorSelects elements based on their attribute and value.[type="text"]Selects all elements with the attribute “type” and value “text”.
    Descendant SelectorSelects elements that are descendants of another element.div pSelects all <p> elements that are descendants of a <div> element.
    Child SelectorSelects elements that are direct children of another element.ul > liSelects all <li> elements that are direct children of a <ul> element.
    Pseudo-Class SelectorSelects elements based on their state or position in the document.a:hoverSelects all <a> elements when the mouse is hovering over them.

    Tag Selector

    Tag selectors can be used to target specific HTML elements.

    For example, you could use the title, h1, p or a tag selectors in order to select each element.

    <html>
        <head>
            <title>Title of your web page</title>
        </head>
        <body>
            <h1>Heading of the page</h1>
            <p id="first-paragraph" class="paragraph">Paragraph of text</p>
            <p class="paragraph">Paragraph of text 2</p>
            <div><p class="paragraph">Nested paragraph</p></div>
            <a href="/a-link">hyperlink</a>
        </body>
    </html>
    

    To select the h1, here is the selector that you would use.

    h1
    &lt;h1>Heading of the page&lt;/h1>
    

    Class Selector

    This example shows how to use the dot notation to select the elements with a certain class

    This expression selects all the <p> HTML tags that have a class attribute with “paragraph” as its value.

    p.paragraph
    
    &lt;p id="first-paragraph" class="paragraph">Paragraph of text&lt;/p>
     &lt;p class="paragraph">Paragraph of text 2&lt;/p>
     &lt;p class="paragraph">Nested paragraph&lt;/p>
    

    ID Selector

    An ID Selector uses the hashtag to target an element with its unique id.

    p#first-paragraph
    &lt;p id="first-paragraph" class="paragraph">Paragraph of text&lt;/p>
    

    Attribute Selector

    An attribute Selector selects elements based on their attribute and value.

    It uses the syntax:

    tag-name[attribute="attribute-value"]
    

    Example:

    a[href="/a-link"]
    
     &lt;a href="/a-link">hyperlink&lt;/a>
    

    Descendant Selector

    The descendant selector selects all descendants elements of another element.

    This is done using the parent and the descendant tags with a space instead of a >.

    body p
    
    &lt;p id="first-paragraph" class="paragraph">Paragraph of text&lt;/p>
     &lt;p class="paragraph">Paragraph of text 2&lt;/p>
     &lt;p class="paragraph">Nested paragraph&lt;/p>
    

    Child Selector

    The child selector selects elements that are direct children of another element.

    body > p
    

    The difference here is the any element that are not direct, e.g. nested elements like the p inside the div will not be selected.

    &lt;p id="first-paragraph" class="paragraph">Paragraph of text&lt;/p>
    &lt;p class="paragraph">Paragraph of text 2&lt;/p>
    

    Pseudo-Class Selector

    The pseudo-class selector uses the colon : to select elements based on their state or position in the document.

    For example, you can define which sibling that you want to locate using the :nth-of-type(N).

    The CSS expression below shows how to select the first div of the body element.

    html > body > div:nth-of-type(1)
    

    Unlike Python that uses zero-based indexing, the index in CSS Selectors starts at 1.

    There are many pseudo-class selectors, some of which are described in this table.

    Pseudo-class SelectorDescription
    :hoverSelects an element when the mouse pointer hovers over it
    :activeSelects an element when it is being activated (e.g. clicked on)
    :visitedSelects a link that has been visited by the user
    :focusSelects an element when it has focus (e.g. when a form field is selected)
    :first-childSelects the first child element of its parent element
    :last-childSelects the last child element of its parent element
    :nth-child(n)Selects the nth child element of its parent element (where n is a number)
    :nth-of-type(n)Selects the nth element of its type (where n is a number)
    :last-of-typeSelects the last occurrence of an element within its container (e.g. h2:last-of-type)

    Using CSS Selectors with Python

    You can use CSS selectors when Web Scraping with BeautifulSoup or when Web Scraping with Scrapy.

    Scraping CSS Selectors with BeautifulSoup

    BeautifulSoup is not really allowing CSS selectors. To extract using CSS tag, you would need to do something like:

    from bs4 import BeautifulSoup
    import requests
    
    # Fetch the HTML
    url = 'https://crawler-test.com/'
    response = requests.get(url)
    
    # Parse the HTML
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Define CSS
    soup.find_all("div", {"class":"panel-header"})
    

    Scraping CSS Selectors with Scrapy

    Scrapy allows you to scrape content using CSS selectors with the css() method from the Selector class.

    from scrapy import Selector
    html = '''<html>
        <head>
            <title>Title of your web page</title>
        </head>
        <body>
            <h1>Heading of the page</h1>
            <p id="first-paragraph" class="paragraph">Paragraph of text</p>
            <p class="paragraph">Paragraph of text 2</p>
            <div><p class="paragraph">Nested paragraph</p></div>
            <a href="/a-link">hyperlink</a>
        </body>
    </html>'''
    
    # Instantiate Selector
    sel = Selector(text=html)
    
    # define Selector class
    css_selector = 'body > p#first-paragraph'
    
    sel.css(css_selector).extract()
    
    ['&lt;p id="first-paragraph" class="paragraph">Paragraph of text&lt;/p>']
    

    Using CSS Selectors with JavaScript

    Whatever way you are executing JavaScript (e.g. Chrome DevTools Console, NodeJS, etc.), you can use JavaScript the querySelector or querySelectorAll commands to locate HTML elements with their CSS selectors.

    Here is an example using querySelectorAll to find all the links on a page.

    var links = document.querySelectorAll('a');
    var anchorText = [];
    
    links.forEach(link => {
      anchorText.push(link.textContent);
    });
    
    console.log(anchorText);
    

    XPath to CSS Selector Conversion

    EquivalencyXPath NotationCSS Selector
    Select by element type//divdiv
    Select by class name//div[@class=”example”]div.example
    Select by ID//*[@id=”example”]#example
    Select by attribute//input[@name=”example”]input[name=”example”]
    Select by attribute value containing//input[contains(@class, “example”)]input[class*=”example”]
    Select by attribute value starting with//input[starts-with(@id, “example”)]input[id^=”example”]
    Select by attribute value ending with//a[ends-with(@href, “example”)]a[href$=”example”]
    Select by sibling//div/following-sibling::pdiv + p
    Select by descendant//div//pdiv p
    Select by first child//div/p[1]div > p:first-child
    Select by last child//div/p[last()]div > p:last-child

    Difference Between Xpath and CSS Selectors

    The difference between XPath and CSS selectors is that with the XPath we can move forward and backward while a CSS selector can only move forward while XPath can be a bit more complex.

    Articles Related to Web Scraping

    Enjoyed This Post?