Beginner Guide To Regex For SEO

Share this post

In this guide, I will show you how to use Regex for SEO, even if you have no programming knowledge.

RegEx are easy to learn and amazingly useful, so make sure that you go through this entire tutorial because it is going to be one of the best time vs results investment in your SEO career.

This post is to help you learn regular expressions. Not all of the RegExes will work in Google Search Console as it uses a particular syntax. Read this post if you are looking for RegEx for Google Search Console.


Subscribe to my Newsletter


What Are Regular Expressions (Regex)?

Regex, or regular expressions, are used to detect patterns in sequences of characters in strings.

With RegEx, you can easily match many results that have the same pattern.

Basic Regular Expressions

.Any character
.*0 or more characters
.+1 or more characters
?Optional character
^Beginning of a line
$End of a line
\Escape a special character

For example, one of the most common patterns that I use with Google Analytics is this one:

.*site1.*|.*site2.*

or the equivalent:

.*site(1|2).*

This way I can match any of those results:

#Match
site1.com
site1.fr
site2.ca
www.site2.com
site2.ca/url-path

#No Match
www.google.com

RegEx is not specific to any programming language. So, whether you are using Google Analytics, or programming in Python, JavaScript or Java, you’ll need at some point to use Regular Expressions.

Regular expressions have different flavours from one programming language to the other.

However, if you learn how to use general regular expressions, you’ll have no problem using them in any of the programming languages.

You Might Also Like  Uncompress Multiple GZip files with Python

Get Started With RegEx

This guide will walk you through the basics of RegEx. If you want to go further, make sure that you look at my favourite tool, Regex101, and this RegEx Cheat Sheet.

Not All RegEx are Equal

Regular Expressions are used in computer programming and data analysis.

Depending on the programming language that you use, or the tool that you use, some RegEx will not work.

Why Learn RegEx for SEO?

SEOs start using Regex mostly because they use Google Analytics and data analysis.

Then, they’ll start using it for crawling and scraping purposes and as their career and knowledge progress, they’ll start using it to make API calls, until they use them everywhere.

To filter out all organic traffic coming from Google, including Google Search and Google For Jobs, but excluding Google CPC.

In this case, you would go to Acquisition > All Traffic > Source/Medium > Advanced and would use the .*google.*organic.* regular expression to filter out your results.

And then you’d get a report like this.

I know that this is fairly basic, but I just wanted to show why you’ll absolutely need regexes one day or the other in your SEO career.

The Regular Expressions in Google Analytics are quite limited compared to what you can actually do with Regex.

Regular Expressions Basics

Let’s dive into the regular expression basics.

Match Characters

To match one or multiple characters you could use flags like we just saw. You can also use wildcards or other specific set of indications.

  • . matches anything. SE. will match SEO and SEM;
  • [aeiou] matches one of those vowels. b[aiu]g will match bag, big and bug. [aeiou]\g would match mutiple vowels;
  • [a-z] matches a rage of characters. This would match any lowercase character from the alphabet. To match any lower and uppercase characters you could use [a-z]\i or [a-zA-Z];
  • [0-9] matches a range of numbers from 0 to 9. You can combine the regEx to match numbers and letters like this: [2-5b-h];
  • ^ only match if it starts with the string. ^SEO.* matches SEO is great but not I love SEO.
  • $ only match if it ends with the string. .*regex$ matches I love working with regex, but does not match regex are awesome.
  • Colou?r says the previous character “u” is optional. It matches Color and Colour.

OR / And Logic

You’ll want to include one or more result or merge multiple conditions in your regular expressions using logical OR.

You Might Also Like  Get BERT Score for SEO (by Pierre Rouarch)

Using the | symbol, you’ll be able to match multiple conditions.

When you need ALL conditions to be true, you can combine them using an alternative to a AND operator using the pattern .*(?=.*pattern)(?=.*pattern).*

For example:

  • python | seo – Matches: python OR seo. Matches: Python jobs, SEO jobs, Python for SEO.
  • .*(?=.*python)(?=.*seo).* – Matches: python AND seo. Matches: Python for SEO, SEO with Python but do not match SEO jobs.

The AND syntax is not supported in Google Analytics.

You would need to do it this way.

Quantifiers

Quantifiers, or quantity specifiers, are useful to tell the number of times that you want to repeat a character. This represents the number of times the previous thing can match.

One or more times+
Twice{2}
Three to five times{3,5}
Zero or more times*
Once or none?

Negated Character Sets

When you want create a set of characters that you don’t want to match, you need to use negated character sets.

To create them, you can use carets character inside a character set ([^]).

  • [^] matches string that does not include. [^aieou] match a single character not present in the list [aeiou];

Positive And Negative Lookahead

Lookaheads are patterns that tell to lookahead in your string to check for the patterns you specify.. There are positive lookahead ((?=)) and negative lookahead ((?!)) .

se(?=o)
seo #match "se"
sem #no match 

se(?!o)
seo #no match
sem #match "se"

Greedy and Lazy Matching

In regular expressions, a greedy match finds the longest possible part of a string that satisfies the regex. A lazy match is the opposite. It finds the smallest possible part of the string that matches the regex.

  • .* is a greedy match since it matches anything. <.*> will match <h1>This is HTML</h1>
  • ? is a lazy match. <.*?> will match <h1>

Group Elements of a RegEx

You can group elements of a RegEx with parentheses () in an element called a capture group.

  • sam.*(hunt|jackson) would match sam hunt and samuel l. jackson, but not sammy davis jr.

Other Useful Regex

(?<=[\/])\d{2,} Matches any numbered ID preceded by a backslash.

You Might Also Like  Facebook Graph API: Get Access Token

^\s+|\s+$ Select all white spaces at the begining and at the end of a string. This can be useful when doing data manipulation.

(?<=\.)(.*?)(?=\.) Lets you extract a domain name. This will match any string between two dots.

(?<=string)(.*) Matches anything after a string, excluding that string. Useful to clean-up URLs.

Flags (not for GA or GSC)

Flags will help you determine what kind of character to match. You might want to ignore case when matching or match only numbered words.

To do this, you’ll need to end your regex with a flag like this:

google\i

Matches google and Google.

The most useful flags are:

  • \i ignore case;
  • \g matches more than once (JavaScript);
  • \d matches one digits from 0 to 9;
  • \w matches ASCII letter, digit or underscore. It is the same as [A-Za-z0-9_]\g;
  • \s matches whitespace;
  • \D matches anything that is not a digit from 0 to 9;
  • \W matches anything that is not a ASCII letter, digit or underscore;
  • \S matches anything that is not a whitespace.

Testing Your Regular Expressions

Here are three websites, to test, save and share your regular expressions.

Go the Extra Length

Check out Paul Shapiro presentation on Regular Expression.

To learn more technical SEO, I deeply suggest that you start learning Python.