The more popular a page is (as measured by page hierarchy and the number of links pointing to it,) the more often search engines crawl and index its content.
This also suggests that crawlers will skip less popular pages and do not crawl them as often.
At first sight, this might not seem like a major problem. Well, at least not until you update some of those lower authority pages and will want Google to crawl them.
In this post, we’ll show you how to make it happen. We’ll discuss the idea of crawl depth, and how to ensure that less popular pages get indexed often as well.
But first, let’s do a quick recap on how Google crawls and indexes enterprise sites.
How Content Ends Up in SERPs
The search engine’s process of finding and including web pages in SERPs consists of four steps:
- Crawl, in which the search engine bot goes through a website to discover its content. Typically, they do so by following internal links, although crawlers can also use the sitemap for this purpose.
- Render, at which stage, the crawler renders each page to identify and analyze its content. This step is all about learning what a page is about and evaluating its content (among other things) to determine its quality and authority.
- Index, when the search engine includes (or updates) the information about those pages in its index.
- Rank, which is all about associating the page in a relevant position for a specific query in the Google search results.
Of the entire process, in this post, we’re concerned with the first step - Crawl. Because unless the search engine can access, discover, and crawl a website’s assets, its ability to rank for relevant keywords will be severely impacted.
However, ensuring the crawlability of the site is just the first step. Another issue to consider is getting the crawler to regularly visit less popular content within a site’s crawl budget as well, and that’s where the issue of crawl depth comes in.
What is Crawl Depth?
The term crawl depth refers to how many pages a search engine’s bot will access and index on a site during a single crawl.
Sites with high crawl depth see many of their pages crawled and indexed. Those with a low crawl depth would, typically, have many of its pages not crawled for long periods of time.
Crawl depth relates to the site’s architecture. If you consider the most common architecture, with the homepage at the top of the hierarchy, and inner pages linked from it in tiers, then, the crawl depth defines how deep into those tiers the crawler will go.
Crawl depth is often confused with page depth. The two aren’t the same, although I can understand where the confusion is coming from.
The term - page depth - defines how many clicks a user needs to reach a specific page from the homepage, using the shortest path, of course. Using the graph above as an example, the homepage is at depth 0, right there at the top.
Pages right below it would be on depth 1, those right below, depth 2, and so on. This is how it's reflected in our Clarity UI.
As a rule, pages on depth 3 and lower will, generally, perform poorer in the organic search results. This is because the search engines may have issues reaching and crawling them, within the site’s allocated crawl budget.
This brings us to another issue - crawl prioritization.
What is Crawl Prioritization?
We’ve discussed how search engines find, crawl, and index content. We’ve also talked about how site architecture and page depth can affect the process.
But there is one other thing we must touch upon, the correlation between the site’s architecture and page authority (or to put it differently, which pages earn the most links.)
Typically, the homepage would earn the majority of a site’s links. Pages in tiers 2 and 3 earn few links, unfortunately. However, this shouldn’t come as a surprise when you consider the content of those pages - product categories, product pages, money pages, and so on. All of those are neither linkable assets nor pages webmasters would reference in their content often. Pages further down - blog and other informational content - can earn more links.
For example, analyzing the link profile of Ugmonk.com, a clothing store, I can see that the majority of the links point to the shop’s homepage. Granted, some of those links reference the HTTP URL but it’s still the same homepage.
Why is this so important? Well, because pages with more links pointing to them will seem more popular. As a result, they will have a higher crawl priority than pages with fewer or no links. Those pages can also often serve as entry points for crawlers.
In short, those popular pages get crawled more often, and also, provide another entry point for crawlers, after the homepage.
But what about the rest? Well, that’s the problem. Unless those other assets are linked from popular pages, their chances for being crawled regularly diminish greatly.
Of course, it doesn’t mean that crawlers will never access those pages (although it can be true for assets very deep in the site’s architecture.) But as said, their crawling potential might be significantly smaller.
So, what to do? How can you increase crawl efficiency and prioritize less popular content?
First of all, optimize your internal linking workflow. Reduce the number of clicks required to reach pages you want to have crawled more often.
Identify opportunities to link to target pages from popular content as well. Using seoClarity’s Internal Link Analysis tool, evaluate the current internal links to the page. The screenshot below shows the analysis for pages in depth 3. Note that all of those assets have only one internal links pointing to them, suggesting an opportunity to bring them higher in the site’s hierarchy with clever interlinking.
Also, use categories and tags, if possible in your CMS, to provide the additional structure a crawler could follow to discover this content.
Create and update the XML sitemap often. A sitemap includes the list of all URLs you want the search engine to index, along with information about when a page has been updated last. As a rule, search engines will crawl URLs in the sitemap more often than others, so, by keeping the sitemap fresh, you increase the chances for those target pages to get crawled.
Finally, increase page speed. A fast site will reduce the time required for crawlers to access and render pages, resulting in more assets being accessed during the crawl budget.
(A quick note: seoClarity runs page speed analysis based on Lighthouse data to deliver the most relevant insights to drive your strategies.)
- Pages with more links are crawled more frequently
- The higher the page’s depth, the lower its chance for being crawled within the site’s crawl budget
- To increase the crawl frequency of less authoritative pages, improve internal linking and the site’s architecture, update the sitemap regularly, and speed up the site.