In this guide, I will show you how to use Regex for SEO, even if you have no programming knowledge.
RegEx are easy to learn and amazingly useful, so make sure that you go through this entire tutorial because it is going to be one of the best time vs results investment in your SEO career.
This post is part of the complete Guide on Python for SEO
What Are Regular Expressions (Regex)?
Regex, or regular expressions, are used to detect patterns in sequences of characters in strings.
With RegEx, you can easily match many results that have the same pattern.
Basic Regular Expressions
. | Any character |
.* | 0 or more characters |
.+ | 1 or more characters |
? | Optional character |
^ | Beginning of a line |
$ | End of a line |
\ | Escape a special character |
For example, one of the most common patterns that I use with Google Analytics is this one:
.*site1.*|.*site2.*
or the equivalent:
.*site(1|2).*
This way I can match any of those results:
#Match
site1.com
site1.fr
site2.ca
www.site2.com
site2.ca/url-path
#No Match
www.google.com
RegEx is not specific to any programming language. So, whether you are using Google Analytics, or programming in Python, JavaScript or Java, you’ll need at some point to use Regular Expressions.
Regular expressions have different flavours from one programming language to the other.
However, if you learn how to use general regular expressions, you’ll have no problem using them in any of the programming languages.
Get Started With RegEx
This guide will walk you through the basics of RegEx. If you want to go further, make sure that you look at my favourite tool, Regex101, and this RegEx Cheat Sheet.
Not All RegEx are Equal
Regular Expressions are used in computer programming and data analysis.
Depending on the programming language that you use, or the tool that you use, some RegEx will not work.
Why Learn RegEx for SEO?
SEOs start using Regex mostly because they use Google Analytics and data analysis.
Then, they’ll start using it for crawling and scraping purposes and as their career and knowledge progress, they’ll start using it to make API calls, until they use them everywhere.
To filter out all organic traffic coming from Google, including Google Search and Google For Jobs, but excluding Google CPC.
In this case, you would go to Acquisition > All Traffic > Source/Medium > Advanced
and would use the .*google.*organic.*
regular expression to filter out your results.

And then you’d get a report like this.

I know that this is fairly basic, but I just wanted to show why you’ll absolutely need regexes one day or the other in your SEO career.
The Regular Expressions in Google Analytics are quite limited compared to what you can actually do with Regex.
Regular Expressions Basics
Let’s dive into the regular expression basics.
Match Characters
To match one or multiple characters you could use flags like we just saw. You can also use wildcards or other specific set of indications.
.
matches anything.SE.
will matchSEO
andSEM
;[aeiou]
matches one of those vowels. b[aiu]g will matchbag
,big
andbug
.[aeiou]\g
would match mutiple vowels;[a-z]
matches a rage of characters. This would match any lowercase character from the alphabet. To match any lower and uppercase characters you could use[a-z]\i
or[a-zA-Z]
;[0-9]
matches a range of numbers from 0 to 9. You can combine the regEx to match numbers and letters like this:[2-5b-h]
;^
only match if it starts with the string.^SEO.*
matchesSEO is great
but notI love SEO
.$
only match if it ends with the string..*regex$
matchesI love working with regex
, but does not matchregex are awesome
.Colou?r
says the previous character “u” is optional. It matchesColor
andColour
.
OR / And Logic
You’ll want to include one or more result or merge multiple conditions in your regular expressions using logical OR.
Using the |
symbol, you’ll be able to match multiple conditions.
When you need ALL conditions to be true, you can combine them using an alternative to a AND operator using the pattern .*(?=.*pattern)(?=.*pattern).*
For example:
-
python | seo
– Matches: python OR seo. Matches: Python jobs, SEO jobs, Python for SEO. .*(?=.*python)(?=.*seo).*
– Matches: python AND seo. Matches: Python for SEO, SEO with Python but do not match SEO jobs.
The AND syntax is not supported in Google Analytics.
You would need to do it this way.

Quantifiers
Quantifiers, or quantity specifiers, are useful to tell the number of times that you want to repeat a character. This represents the number of times the previous thing can match.
One or more times | + |
Twice | {2} |
Three to five times | {3,5} |
Zero or more times | * |
Once or none | ? |
Negated Character Sets
When you want create a set of characters that you don’t want to match, you need to use negated character sets
.
To create them, you can use carets character inside a character set ([^]
).
[^]
matches string that does not include.[^aieou]
match a single character not present in the list[aeiou]
;
Positive And Negative Lookahead
Lookaheads are patterns that tell to lookahead in your string to check for the patterns you specify.. There are positive lookahead ((?=)
) and negative lookahead ((?!)
) .
se(?=o)
seo #match "se"
sem #no match
se(?!o)
seo #no match
sem #match "se"
Greedy and Lazy Matching
In regular expressions, a greedy
match finds the longest possible part of a string that satisfies the regex. A lazy
match is the opposite. It finds the smallest possible part of the string that matches the regex.
.*
is a greedy match since it matches anything.<.*>
will match<h1>This is HTML</h1>
?
is a lazy match.<.*?>
will match<h1>
Group Elements of a RegEx
You can group elements of a RegEx with parentheses ()
in an element called a capture group.
sam.*(hunt|jackson)
would matchsam hunt
andsamuel l. jackson
, but notsammy davis jr.
Other Useful Regex
(?<=[\/])\d{2,}
Matches any numbered ID preceded by a backslash.
^\s+|\s+$
Select all white spaces at the begining and at the end of a string. This can be useful when doing data manipulation.
(?<=\.)(.*?)(?=\.)
Lets you extract a domain name. This will match any string between two dots.
(?<=string)(.*)
Matches anything after a string, excluding that string. Useful to clean-up URLs.
Flags (not for GA or GSC)
Flags will help you determine what kind of character to match. You might want to ignore case when matching or match only numbered words.
To do this, you’ll need to end your regex with a flag like this:
google\i
Matches google and Google.
The most useful flags are:
\i
ignore case;\g
matches more than once (JavaScript);\d
matches one digits from 0 to 9;\w
matches ASCII letter, digit or underscore. It is the same as[A-Za-z0-9_]\g
;\s
matches whitespace;\D
matches anything that is not a digit from 0 to 9;\W
matches anything that is not a ASCII letter, digit or underscore;-
\S
matches anything that is not a whitespace.
Testing Your Regular Expressions
Here are three websites, to test, save and share your regular expressions.
Go the Extra Length
Check out Paul Shapiro presentation on Regular Expression.
To learn more technical SEO, I deeply suggest that you start learning Python.
Sr SEO Specialist at Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.