Introduction to HTML for SEO and Web Scraping

HTML, or HyperText Markup Language, is a markup language to create web pages.

This tutorial is a simple introduction to HTML focusing on SEO.

You will learn the basics to help you read web pages and highlight what is important for SEO without diving too deep into HTML as there are plenty of video tutorials on the subject.


Subscribe to my Newsletter


Learning about HTML, will give you some of the basic knowledge required in Web Development, SEO and Web Scraping.

What is HTML

HTML stands for HyperText Markup Language is a standard markup language used for creating web pages.

What is a Markup Language?

A markup language is a text-encoding system using symbols to control the structure and formatting of a text document.

Examples of Markup languages are:

  • HTML
  • XML
  • GML

The Role of Web Browsers

Regardless of the browser that you use, their main purpose is to interpret HTML, CSS and JavaScript files and convert them into a usable website.

When the browser receives and HTML file, it opens it and renders and display the HTML to the user.

What is a web browser: A web browser, whether it is Firefox, Chrome or Safari, is a software to access the Internet by reading an interpreting HTML, CSS and Javascript files sent by the host server.

SEO Definitions

Simple HTML Example

Let’s look at a simple HTML example to understand how a web browser is going to translate the code and interpret it as a web page.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        Simple HTML SEO Tutorial
        <!--Single Line comment-->   
        <!-- 
        Multiple lines
        Comment
        -->  
    </body>
</html>

If you don’t have a text editor installed, you can head over to codepen.io to play around with this tutorial.

Basic HTML Tags

Along with the HTML code, you will find a bunch of HTML Tags. HTML tags are keywords that begin and end with angled brackets <>.

The HTML contains HTML tags, that contain keywords that begin and end with angled brackets <>. HTML tags usually comes in pairs

  • starting tag: <>,
  • stopping tag: </>

Let’s look at a few tags

  • The declaration tag: <!DOCTYPE html> is a declaration that is always at the top of the file. It tells the browser which version of the HTML that the page is using.
  • The root tag: <html></html>. The <html> tag is defining where the HTML begins and the closing </html> tag defines where it ends.

The content of the HTML is divided into two parts: the head and the body of the HTML.

  • The head tag: <head></head>. The <head> is used to define the “header” of the HTML to provide information about the page that will not be visible to the user. This information is called metadata and is used to give information about what is the web page and how to display it to the user.
  • The body tag: <body></body>. The <body> tag defines the body of the HTML. This is the portion of the HTML that is visible to the user.
  • The title tag: <title></title>. The <title> tag is a required tag included in the header of the HTML. It defines the title of your web page. It is used by the browser to give a preview in the tab of the page and by Google as the title that they will show in the search results. More on that later.

Opening and Closing Tags

The content of the HTML is usually surrounded by opening <> and closing tags </>, like below where the content is surrounded by the paragraph brackets <p> and </p>.

<p>I love SEO!</p>

In HTML5, it is not necessary to close all HTML tags. The tags that don’t need a closing tag are called “self-closing” tags. Those will usually be defined by an opening bracket and end with the backslash bracket <tag />. Here are some examples.

<br /> 
<hr /> 
<meta charset="UTF-8" />

Let’s See How this Looks in a Web Browser

If we save the HTML code that we have just above as a .html file, we can look at how it behaves in a browser. You can do that in any text or code editor.

Example of HTML Page in the Browser
Example of HTML Page in the Browser

If you see it in the tab above the page. The tab is displaying the “Simple SEO Title” title that I gave it between the <title></title> brackets.

Press command + option + i or Ctrl + Shift + i on Windows to open the Developer Console and view the DOM.

Rendering of the HTML by a browser
Rendering of the HTML by a browser

The Hierarchy of the HTML

The HTML has a tree-like structure where HTML tags can be nested within each other.

In the example below, the body tag is the parent of the div tag. The div tag is the child of the body tag and the parent of the p tag (html > body > div > p).

Parents and Children

In the example below, the <div> and the <p> are both children tags of the <body> HTML tag. The <p> inside the <div> is the second child of the <body> tag and the third child of the <html> tag.

Thus, the HTML tag has 3 generations of children:

Sliblings

In the hierarchy, HTML tags have siblings when they are nested at the same level, within the same parent HTML tag.

HTML Tags and Attributes

So fart, we have seen basic HTML tags that followed the bracket-tag-name structure (<tag-name>). HTML tags however can contain attributes.

HTML tag attributes are used to provide instructions for the content of the tag.

<tag attribute="attribute information"> ... </tag>

The HTML tags :

  • Tag name: html, body, div, …
  • Attribute name: href, id, class, …
  • Attribute information: Detail between quotes

Most Common HTML Tag Attributes

Attributes can be used for identification of HTML tags, linking or referencing, and more.

A typical HTML tag will contain and id and a class attribute. The <a> tag has a special and very important attribute, the href attribute.

<a id="unique-id" class="no-unique-id" href="/hyperlink"> ... </a>
  • id: unique id of an HTML tag, should be existing only once on a page
  • class: class that can identify multiple similar HTML tags, don’t need to be unique.
  • href: attribute used to locate the URL where the href link redirects to

Not all attributes are possible to be used with all tags.

Important HTML Tags for SEO

Let’s look at the most important tags that SEO should know about.

Links

Links in a page are represented using the <a> tag with an attribute called href. Links are probably the most important HTML tag to know about as the web is discovered by following links from page to page, and website to website.

Without links, your website can’t be discovered.

All links don’t have the same value though. Some have nofollow, some are created via JavaScript, some need a form validation, etc.

I will not go in the depth of this subject as this is an entire SEO subject by itself, but just know that a proper link should have a relevant anchor text and a href attribute.

<a href=”https://www.jcchouinard.com/”>Anchor Text</a>

The href value is the link where you want users to go when they click on it, the anchor text is what Google will use as the context of the link to know what the page you are linking to is about.

One example of the usage of anchor text by Google is to provide site links in the search results.

Anchor Text SiteLinks
Anchor Text SiteLinks

Here are a few examples of good and bad links as Google as explained at Google I/O.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
        <link rel="alternate" href="https://www.jcchouinard.com/en" hreflang="en-ca"> <!--International SEO -->
        <link rel="canonical" href="https://www.jobillico.com/fr"> <!--Tell Google your preferred page -->
    </head>
    <body>
        <h1>Blog post title</h1>
        <a href=”/good-link”>Will be crawled and bring SEO value</a><br />
        <a href=”/good-link” rel=”nofollow”> Will be crawled, no SEO value</a><br />
        <span onclick="changePage('bad-link')">Not crawled</span><br />
        <a onclick="changePage('bad-link')">Not crawled</a><br />
        <a href="/good-link" onclick="changePage('/good-link')">Crawled</a><br />
        <a href=”/bad-link#see-more”>Fragments are really hard on bots. Please stop!</a><br />
        <a href="/good-link?sort=latest-posts">If parameters don’t change content. Use a canonical</a>
    </body>
</html>

Canonicals

The canonical tag is a tag placed in the header to tell Google what page you prefer when you have more than one pages that are essentially the same.

It is a super important tag to keep track of as the link in the canonical tag is the page that you tell Google to index.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
        <link rel="canonical" href="https://www.jobillico.com/fr">
    </head>
    <body>
    </body>
</html>

Alternate

Alternate tags are added to the header and used to tell Google if your page has an equivalent. The usual cases are when you operate in international locations, or when you have a mobile-specific website.

Hreflang

Hreflang tags should be reciprocal. If you add an alternate link page A to a page B, page B should have that alternate link back to page A.

On Page A

<link rel="alternate" href="/page-B" hreflang="en-ca">
<link rel="canonical" href="/page-A">

On Page B

<link rel="alternate" href="/page-A" hreflang="en-ca">
<link rel="canonical" href="/page-B">

Mobile

Some website has their mobile version on a separate URL. When you are in that situation, add an alternate tag to the page.

On the Desktop, add the alternate to the media-width.

<link rel="alternate" media="only screen and (max-width: 480px)" href="https://www.jcchouinard.com/mobile/">
<link rel="canonical" href="https://www.jcchouinard.com/">

On the mobile, set the canonical back to the desktop:

<link rel="canonical" href="https://www.jcchouinard.com/">

Meta Robots

Meta Robots are super important. They let you tell robots what you want them to do with your page. Adding a noindex, nofollow value to the content attribute would tell any search engines (like Google) not to index the page, nor follow any links on the page.

That tags mean that your web page will not be shown in the search results in Google.

<head>
<meta name="robots" content="noindex, nofollow">
</head>

Title

As we have seen the Title is used to display a title in the browser, but also used to display a title in search results by search engines like Bing and Google.

<head>
<title>Title of your webpage.</title>
</head>
Display of the Title in the SERPs
Display of the Title in the SERPs

Meta Description

The Meta Description is added to the head of the page to tell search engines what you want to use as a description of the page.

<head>
<meta name="description" content="Meta Description with less than 300 characters.">
</head>

Headings (h1-h6)

Headings help to define the structure of the page by having different headings. the <h1> heading being the most important of the page all the way down to the <h6>.

It is important to have a relevant structure where the h1 contains multiple h2 relevant to the h1, the h2 contain multiple h3 relevant to the h2, and so on.

It is a simple title structure to help the reader read a web page.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        <h1>Largest heading</h1>
        <h2>Large heading</h2>
        <h3>Slightly smaller heading</h3>
        <h4>Slightly smaller heading</h4>
        <h5>Slightly smaller heading</h5>
        <h6>Smallest Heading</h6>
    </body>
</html>

It will be returned in the browser like this.

Some general guidelines with headings.

These are guidelines, not rules. SEO is not about having the best heading structure, but about providing the best content to the user.

  • Have only 1 h1 per page, it is usually the title of your page or blog post;
  • Follow a structure where smaller headings come after bigger headings. This is just the proper way to write content. A subtitle should come after a title.

Example of a good heading structure.

<h1>Big Title</h1>
    <h2>Section 1</h2>
        <h3>Subtitle for section 1</h3>
        <h3>Subtitle for section 1</h3>
    <h2>Section 2</h2>
        <h3>Subtitle for section 2</h3>
        <h3>Subtitle for section 2</h3>

Images

Images can be added to a web page using the <img src=""> tag. The source can either be a file available on your desktop, or a link to the image.

As you can see, this tag is a “self-closing” tag.

Here, there is also an HTML attribute, called src (source).

The attribute serves to tell the browser where to get the image from.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        Image from a file:
        <img src="cat.jpg">
        Image from a link:
        <img src="https://u3c4j6t8.stackpathcdn.com/wp-content/uploads/2014/06/Fat-Cat_400-2.jpg">
    </body>
</html>
Img HTML tag
Img html tag

There is another attribute that you should know about.

The alt attribute.

<img src="cat.jpg" alt="Big Fat Lazy Cat">

It is recommended to add an alt attribute to each of your images for two reasons.

The alt attribute will be the text used to describe the image in case the image doesn’t load properly.

Alt attribute showcase
Alt attribute showcase

Also, if you use your image as a link, the alt attribute will be used as the anchor text of the link.

Be wary of the size of the image file that you upload. Large images will slow down your webpage which is bad for the user and bad for SEO.

Example of a slow loading image
Example of a slow-loading image

Other HTML Tags

Lists

Lists can be ordered and unordered.

Ordered lists will have simple bullet points whereas ordered lists are usually numbered or alphabetized lists.

You can create an ordered list using <ol> and an unordered list using <ul> in pair with list items <li> tags.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        Ordered List:
        <ol>
            <li>Numbered item</li>
            <li>Numbered item</li>
            <li>Numbered item</li>
        </ol>
        Unordered List:
        <ul>
            <li>Bullet Point</li>
            <li>Bullet Point</li>
            <li>Bullet Point</li>  
        </ul>
    </body>
</html>

Table

Here is how to make a table without styling.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        <h1>Acronyms</h1>
        <table>
            <tr>
                <th>Name</th>
                <th>Meaning</th>
            </tr>
            <tr>
                <td>SEO</td>
                <td>Search Engine Optimization</td>
            </tr>
        </table>
    </body>
</html>

Forms

A form can be created using <input>. The placeholder attribute is the text that is to be shown before anything is added by the user. The name attribute gives a name to the input field which is used to store the information in a database.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        <input type="text" placeholder="full name" name="name">
        <button>Submit!</button>
    </body>
</html>

Styling With CSS

CSS, or Cascading Style Sheet, is used to add styling to the HTML structure. There are different ways that you can add CSS to your HTML: using a stylesheet, using inline CSS, or adding the CSS inline in your header.

Inline CSS

Inline CSS is a quick and easy way to add styling to your web page. You simply add a style attribute to your tag and give it a value.

Use inline CSS to change only one element of a single web page.

Be careful: inline CSS can be a nightmare to maintain and it can easily create inconsistencies in the styling of your website.

The example below shows one of these inconsistencies where a user with great design skills like me might have decided to style their blog post to look like Windows 97.

<!DOCTYPE html>    
<html> 
    <head>
        <title>Simple SEO Title</title>
    </head>
    <body>
        <h1>Blog post title</h1>
        <h2>Normal Subtitle</h2>
        <h2 style="color:blue;font-size:32px;">Styled Subtitle</h2>
        <h2 style="color:blueviolet;text-shadow:2px 2px #ff0000;font-family:Times New Roman;">Ugly Subtitle</h2>
    </body>
</html>

Inline CSS Styling
Inline CSS Styling

Understand the code.

When you look at this line <h2 style="color:blue;font-size:32px;">, you can see that the h2 tag has a style attribute with two CSS properties: color and font-size. The color property is assigned a blue value and the font-size has the value of 32px.

Inline CSS in the Header

CSS can be added in the header to change all elements of a page.

Using the <style> element in the header, you can add a series of CSS properties to each tag in the entire web page. This is far more efficient and consistent than inline CSS in the body of the page.

Use inline CSS in the Header to change all elements of a single web page.

<!DOCTYPE html>    
<html> 
    <head>
        <title>New Webpage</title>
        <style>
            h1 {
                text-align:center;
            }
            h2 {
                color:blue;
                font-size:32px;
            }
        </style>
    </head>
    <body>
        <h1>New Webpage</h1>
        <h2>Subtitle</h2>
        <h2>Subtitle</h2>
        <h2>Subtitle</h2>
    </body>
</html>

The style.css file, in this case, would look like this.

h1 {
    color:green;
    font-size:50px;
    text-align:center;
}
h2 {
    color:rgb(5, 5, 47);
    font-size:32px;
}

There is a performance advantage for larger websites to add CSS inlined in the header compared to adding it to a separate file. This will result in fewer HTTP requests and improve performance. Don’t worry about that if you are only operating a small website.

Separate CSS Stylesheet

CSS styling can be added in a separate document so that it is accessible for the entire Website. It is the most efficient way to maintain code and apply DRY (Do-Not-Repeat-Yourself) principles.

Adding <link rel="stylesheet" href="yourstyle.css"> tells the browser where to find the stylesheet using the href attribute with the location of the file as a value.

Use a separate CSS Stylesheet to change elements in the entire Website.

<!DOCTYPE html>    
<html> 
    <head>
        <title>New Webpage</title>
        <link rel="stylesheet" href="style.css">
    </head>
    <body>
        <h1>New Webpage</h1>
        <h2>Subtitle</h2>
        <h2>Subtitle</h2>
        <h2>Subtitle</h2>
    </body>
</html>

Responsive

A responsive website is a website that looks good on all devices.

To make your website responsive you need two things a. viewport <meta> in the header and a @media query in the CSS Stylesheet.

First, set the viewport in the head.

<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>

Second, define the page width where you want to change the behaviour, and the CSS properties for that page.

@media screen and (min-width: 600px) {
    .class {

    }

Create a Simple Web Page

Here is a simple template that wrap-up everything into a simple website.

Simple Website with HTML and CSS

You can view the code in codepen or here below.

HTML Sample

<!DOCTYPE html>    
<html> 
    <head>
        <meta content="width=device-width, initial-scale=1.0, viewport-fit=cover" name="viewport">
        <title>Simple SEO Title</title>
        <meta name="description" content="Meta Description with less than 300 characters.">  
        <meta name="robots" content="noindex, nofollow"> <!--This page will not get indexed-->
        <link rel="stylesheet" href="style.css">
        <link rel="alternate" href="https://www.example.com/en" hreflang="en-ca">
        <link rel="canonical" href="https://www.example.com/fr"> 
    </head>
    <body>
        <header>
        <div class="nav">
            <ul>
            <li class="home"><a href="#">Home</a></li>
            <li class="blog"><a class="active" href="#">Blog</a></li>
            <li class="about"><a href="#">About</a></li>
            <li class="contact"><a href="#">Contact</a></li>
            </ul>
        </div>
        </header>
        <div class="body">
            <h1>Blog</h1>
            <p>Lorem ipsum dolor <a href="#">Anchor Text Link</a> sit amet consectetur adipisicing elit. Ipsum vel laudantium a voluptas labore. Dolorum modi doloremque, dolore molestias quos nam a laboriosam neque asperiores fugit sed aut optio earum!</p>
            <h2>Subtitle</h2>
            <p>Lorem ipsum dolor sit amet consectetur adipisicing elit. Ipsum vel laudantium a voluptas labore. Dolorum modi doloremque, dolore molestias quos nam a <a href="#" rel="nofollow">Nofollow link</a> laboriosam neque asperiores fugit sed aut optio earum!</p>
        </div>
    </body>
  </html>

CSS Sample

body {
    font-family: 'roboto';
    margin: 0;
    padding: 0;
    background: rgb(230, 229, 229);
  }
   
  .nav ul {
    list-style: none;
    background-color: #444;
    text-align: center;
    padding: 0;
    margin: 0;
  }
  .nav li {
    font-size: 1.2em;
    line-height: 40px;
    height: 40px;
    border-bottom: 1px solid rgb(190, 190, 190);
  }
   
  .nav a {
    text-decoration: none;
    color: #fff;
    display: block;
  }
   
  .nav a:hover {
    background-color: #3b3b3b;
  }
   
  .nav a.active {
    background-color: #fff;
    color: #444;
    cursor: default;
  }

  .body {
    margin: 10px 30px 50px 30px;
  }
   
  @media screen and (min-width: 600px) {
    .nav li {
      width: 120px;
      border-bottom: none;
      height: 50px;
      line-height: 50px;
      font-size: 1.4em;
    }
    .nav li {
      display: inline-block;
      margin-right: -4px;
    }
    .body {
        display: inline-block;
        margin: 50px 100px;
      }
  }

Interesting HTML Resources

Learn HTML with Codecademy

Check out the ultimate HTML reference by Jeremy Thomas.

Conclusion

This is the end of this tutorial on HTML for SEO. You are now ready to learn SEO, web scraping and web development.

5/5 - (4 votes)