What is RegEx in SEO?

Regular Expressions, or RegEx, are a series of characters that can be used to identify patterns. 

These regular expressions serve as their own language rooted in a combination of text, metacharacters, special sequences and quantifiers.

Once you master the language, you’ll enhance the efficiency of your data analysis ten fold. 

Built by American mathematician Stephen Kleene, RegEx is often used to “find and replace” text. It is the standard on which pattern matching is based and defined, and is implemented in programming languages, search engines, word processors and a lot more. 

In SEO, RegEx is very useful since it can be used to match its most fundamental parts: keywords and URLs. This article will look to identify where and how RegEx can be used for digital marketing efforts, especially for search experience optimization.

The Importance of RegEx

Manually picking and choosing keywords and URLs can be very cumbersome, especially when the keyword or URL sets are large. (Think enterprise sites.)

RegEx patterns, when used correctly and efficiently, help you complete that selection (and subsequently optimization) process at scale and at speed. 

Having the same RegEx patterns to compare the performance of both keywords and pages across analytics, search console, rankings, and site audits helps you standardize your approach to SEO. 

Since regular expressions can be used to match based on search patterns, they are extremely useful in extracting information from text.

Recommended Reading: Extracting Additional Content Using XPath for SEO

Where to Use Regular Expressions

RegEx Patterns in Analytics

Analytics is considered one of the bedrocks of SEO. Analyzing and understanding your customers' journey is invaluable. RegEx can be used to segment your most popular pages and then analyze the popularity of groups of pages.

For example, using RegEx to segment the pages allows you to analyze traffic and bounces based on content types on a much larger scale than you could by using traditional operators.  

RegEx Patterns In Search Console

Search intent helps to segment data by the users’ underlying intent — that is, the reason why someone is searching. It's a crucial component of any digital marketing strategy.

This is most commonly used for Brand and Non-brand analysis. By using RegEx to specify the patterns to match, the data can be segmented on the fly.

RegEx patterns can be used to segment the audience based on what they were thinking about and what they were looking for when they found your site.

They can also be used to break down URLs via RegEx filters so you can start to understand where the traffic is going and what is driving it. The intent with which customers find a site corresponds to what page they land on. 

RegEx Patterns in Rankings 

RegEx can be used to segment ranking data based on page types for the highest ranking URL for a keyword.

Similar to how you would use it with your GSC data, using the same RegEx patterns can also be used to analyze rankings for segments of keywords, such as how SERPs show rankings for Brand vs Non-brand keywords. 

RegEx Patterns in Site Audits

RegEx can be used to create patterns that help with string/text matching. In site audits, it can be used to: 

  • Segment crawled pages based on the URL patterns to manage crawl analysis for a large group of pages in an enterprise site. 
  • Search for text from sites while crawling. 

RegEx patterns in Bot Log Analysis

Regular Expressions also lend their skills to bot file analysis. Bot files are typically broken down and analyzed based on User Agents for search engine bots.

Since bot files for large sites can contain large quantities of pages, the use of RegEx patterns to segment the crawled URL makes the overall analysis easier because it allows you to filter based on complex criteria.

How seoClarity Uses Regular Expressions

Our enterprise SEO platform allows users to slice, dice, and analyze data at scale with a few intuitive clicks.

Because regular expressions are so versatile at sorting information, we’ve built them into a variety of our platform’s features to make data analysis simple. 

If you're looking to learn the characters of RegEx, jump down to the tables below.  

Most of the keyword and URL filters in our platform already have options to support RegEx patterns. But aside from the on-the-fly filters, we also allow you to define and save groupings of data using multiple options using RegEx matches.

Content Types

Content types allow for nested filtering of pages based on multiple criteria that you establish. The criteria is a set of rules using and/or statements. One of the pattern lists that you can create here is a RegEx pattern. 

For example, if you need to create a content type for all of the category pages on your site and you know the URLs that relate to category pages conform to a few different URL patterns, you can use RegEx to combine them into a single filter for easy viewing.

If every category page contains the folder of /cat/ or /category/ or /c/ , you can combine them into one by using a single content type and a RegEx pattern such as /cat/|category/|c.

Here’s what that looks like in the platform:

Creating a new Content Type in the seoClarity platform(Creating a new Content Type in the seoClarity platform.)

Content Types are found throughout the platform, including in:

Search Intent

Classifying keywords by their search intent is another way to segment data into meaningful groups. This aligns with the Brand vs. Non-brand example that I previously mentioned.

You can create different search intent classifications based on regular expressions.

For example, if you are the clothing site hm.com and you would like to create a content type for brand using RegEx, you can set it up the Pattern List to include the regex: hm|h&m|hennes|mauritz.

Creating a new Search Intent in the seoClarity platform(Creating a new Search Intent in the seoClarity platform.)

Search intent can be found in the following places:

  • Rank Intelligence 
  • Search Analytics
  • Site Analytics
  • Link Clarity

Dynamic Tags

Easily sort your keywords or pages into dynamic tags by using RegEx patterns to better understand a group of keywords and URLs as a whole.

Creating dynamic keyword tags and dynamic page tags automatically adds the keyword/page to the tags as they match the RegEx pattern when they are added to the platform.

This alleviates the need to constantly update the tag manually.

The tags can then be used for filters in multiple places within the platform.

Similar to the example shown for the brand search intent above, here is how you would set up a Brand Dynamic Tag in the seoClarity platform using RegEx. 

Creating a dynamic tag

(Creating a dynamic tag.)

Tags help you filter information in the following places within the platform:

  • Rank Intelligence 
  • Page Clarity
  • Search Analytics
  • Site Analytics
  • Link Clarity
  • Research Grid

RegEx Basics (Learning RegEx)

There is a learning curve with RegEx, but once you figure it out it's irreplaceable. Remember that RegEx is its own language, so although time is required to perfect it, it’s best to think of this process as an investment.

You’re going to get so much more out of it than what you put in. 

When using RegEx, it's important to note that at the root of it, everything is a character. The point of writing RegEx patterns is to match a specific sequence of these characters.

Patterns are based in ASCII, which includes letters, digits, punctuation, along with other symbols and unicode characters. 

Recommended Reading: Finding Additional Content: Narrow in on Specific Site Features

RegEx Cheat Sheet: Learn the Characters 

Characters can be categorized in two different ways in RegEx: metacharacters and regular characters. Metacharacters have special meaning while regular characters have a literal meaning.

Metacharacters are the basis upon which RegEx patterns are built. Here are the most common metacharacters and what they are used for:

Metacharacter Example What It Would Match
^ (starts with) ^www Any string that starts with www
$ (ends with) com$ Any string that ends with com
| (either or) left|right Would match either of the strings left OR right
. (any character) s.o This wildcard would match any single character. In the example, it could be used to match the string SEO
* (zero or more repetitions) xyz* This would match strings that start with xy, followed by 0 or more occurrences of z. This would match xy, xyz, xyzz, etc.
+ (one or more repetitions) xyz+ This would match strings that start with xy, followed by 1 or more occurrences of z. This would match xyz, xyzz, xyzzz etc but would not match xy
{} (specific number of repetitions) x{3}, x{3,5} x{3} will match the x character exactly three times and x{3,5} will match the x characters at least 3 but no more than 5 times
() (group) (312) This can be used to group characters together. The example given would match all digits starting with (312)

 

Special Sequences

A special sequence is written as a \ followed by a character. Here are the most commonly used special sequences:

Sequence Example What It Would Match
\d Matches 1 digit from 0 - 9 FileName_\d\d\d would match FileName_123
\D Matches 1 non-digit character FileName_\D\D\D would match FileName_aBc
\w Matches 1 word character (characters from a to Z, digits from 0-9 and the underscore character) \w\w\w\ would match xY1_
\W Matches 1 non-word character \W\W\W would match ,*-
\s Matches 1 whitespace character (including tabs and line breaks) \s\s\s would match [ \n\r\t\f]
\S Matches 1 non-whitespace character \S\S\S would match abc

 

Sets

One of the most commonly used features in RegEx are sets (or character sets). They are used to find and match one out of several characters placed between square brackets.

The order of characters within a character set does not matter, they only match a single character. It is possible to specify a range of characters within a set using hyphens. Combination of ranges and single characters are also often used for complex matches.

Below are some examples of character sets:

[abc] Returns a match where one of the specified characters (a, b, or c) are present
[a-c] Returns a match for any lower case character, alphabetically between a and c
[^arn] Returns a match for any character EXCEPT a, b, and c
[123] Returns a match where any of the specified digits ( 1, 2, or 3) are present
[0-9] Returns a match for any digit between 0 and 9
[0-3][0-9] Returns a match for any two-digit numbers from 00 and 39
[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case

 

RegEx Tutorials/Testing Sites

There are plenty of RegEx tutorials available that help you learn the language. I have used RegEx One in the past and found it to be quite helpful. 

It is always recommended to test out your RegEx before you deploy it. There are many free online sites that allow you to do that. For example, RegEx Pal or RegEx 101

Conclusion

Regular Expressions are a valuable and worthwhile skill to learn that allow you to sort through and analyze data in an efficient way. Although it may take some time to learn RegEx, don’t forget that the best SEO platforms will have a way to segment data using RegEx patterns with ease.