A headless browser is a web browser without a graphical user interface (GUI). Simply put, it is an invisible browser that runs in the background without you seeing it (headless mode).
The first headless web browser, PhantomJS, was created by Ariya Hidayat in 2011. Since then, other headless browsers such as Headless Chrome and Headless Firefox were developed.
This article was written in partnership with Manthan Koolwal, founder of Scrapingdog
Why use a Headless Browser?
Headless browsers are often used to automate tasks that require web browsers (e.g. website testing, web scraping, etc.). Headless browsers can be used to:
- Perform automated tests on a website
- Perform web scraping undetected.
Headless Browser Frameworks (Examples)
According to Google trends, the most popular headless browser framework is Selenium. The most common headless browser frameworks are:
- Selenium Headless Browser
- Puppeteer Headless Browser
- Playwright Headless Browser
They are frameworks that developers use to control headless browsers. These tools allow to perform automated actions, tests and data scraping for example.
Headless Browsers in Web Scraping
Headless browsers are useful in web scraping. Some websites use anti-scraping measures that can detect and block requests coming from automated scripts. Headless browsers can emulate human-like behaviour, making it harder for websites to distinguish between scraping bots and genuine users. This is another advantage of using the headless browser for scraping.
Since there is no GUI, the resources consumed by the browser will be much less while scraping any website at scale. You can run multiple instances of the browser without worrying about the CPU usage.
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.