I’m sure you’re aware that unless you get pages indexed regularly, they’ll eventually perish from SERPs.
(Now, I admit, I exaggerated here a bit. Naturally, they won’t completely disappear. But, I think we all agree that any severe drop in rankings will have the same effect on your visibility as if those pages didn’t exist at all.)
The problem? With thousands of pages, how do you ensure that bots visit the most important ones - especially given that they’ll spend only a limited time crawling your site?
Solution: by optimizing your crawl budget.
And we already wrote about ways to make the most of your available crawl budget.
In this post, however, I want to share another strategy to ensure bots index the most important pages on your site – improving your internal links and pagination.
Why Bother with Crawl Budget at All
A while ago, seoClarity’s co-founder, Mitul Gandhi wrote:
“Crawl budget isn’t something we SEOs think about very often. And, if you aren't familiar with the term 'crawl budget', it doesn't mean money available.”
While I agree with his words, I always find the limited focus on crawl budget a little worrying.
It might not have such direct influence on your metrics or results. But at the same time, wasting crawl budget can seriously impact your pages’ visibility in SERPs, and in turn, severely affect your other efforts.
To demonstrate this, here’s the process bots go through when crawling a site, according to Google:
Notice that, not only is this a complex process but also, there is a striking number of factors that can deter bots from indexing a particular URL. In the graph, Google refers to them as crawl patterns -- factors that:
“[…] prohibit the crawler from following and indexing particular URLs.” (Source)
We can easily assume that the search engine refers to a deliberate action by a webmaster that blocks a bot from accessing the site. This typically happens in a robots.txt file. But, there are other aspects can affect the crawl as well.
Popularity is one of those factors. According to many SEOs, popular domains get crawled more often.
And, it makes sense, doesn’t it? After all, why would Google waste its resources on crawling new sites often?
Freshness is another factor, which ties with what I said about popularity. Google always wants to include the most current information about a page. This means that the more often you update your content, the more likely it is to get crawled.
Freshness and popularity aside, one thing is clear:
Poor site structure - pagination and interlinking - can prevent search engines from crawling and indexing your site.
And so, let’s look at how fixing those issues will help boost your visibility.
How Improving Internal Links and Pagination Helps Prevent Wasting the Crawl Budget
Fact: Internal links boost user experience.
They provide web visitors with reference to additional content they might find useful. In the process, internal links inform Google of other relevant pages on your site and even the keywords for which you’d like them to rank.
But, they also help with your indexing and crawl budget. Here’s how:
#1. Analyzing Internal Links Leads to Removing Crawler Roadblocks
This goes without saying - broken links and long redirect chains will stop bots in their tracks. But what you may not realize is that, at the same time, sending bots to chase URLs that simply do not exist will waste precious crawl budget.
Let me put the severity of this issue into perspective.
Imagine that your site has 100,000 URLs. Now, your Google Search Console suggests that the search engine typically crawls 1000 URLs a day (with some of them, like the homepage, more often than others.)
A simple calculation reveals that it would take it over 3 months to index all content just once.
Now, as we’ve said before, you can optimize your site to make the most of this crawl budget. This typically involves blocking certain content from being crawled. But also, removing broken links and redirects that would send crawlers to non-existing URLs.
Even a simple internal links audit could help you identify and fix issues with URLs, such as hyperlinks including relative URLs that might cause crawling and indexation issues.
In turn, prevent bots from wasting time on pages they shouldn’t concern themselves with.
(seoClarity’s Clarity Audit report showing issues with relative URLs on a site.)
#2. Interlinks Helps Point Bots to Key Pages Much Faster
Take a look at a typical, hierarchical site structure. It’s one of the most popular ways to organize data on a website.
At a quick glance it reveals that a bot accessing a URL will most likely visit the homepage and the next level of content. But whether it visits anything after that is really down to pure chance, isn’t it?
Well, that or a useful interlinking strategy.
Because you can help bots reach crucial pages located deeper in your site’s structure by interlinking them with the most authoritative content.
Bots travel through sites on interlinks. And so, by referencing key pages or category-level assets, you can point bots to them much faster, and ensure quicker crawl, regardless of the crawl budget.
#3. Analyzing Internal Links Helps Identify Pages with Too Many Internal Links
Now, there isn't a set rules for the number of internal links you should or shouldn't have on a page.
Content with three internal links could easily rank as good as one with hundreds.
(Although, as a general rule, you shouldn’t have more than 100 internal references per page. It’s certainly one of the issues we verify in our site audit capability, Clarity Audits.)
That’s rankings and your search visibility. When it comes to crawling, the issue is quite different.
For one, too many URLs could distract a bot, sending it in all directions, instead of sections of the site you’d like it to index faster. With our built-in crawler and site audit technology, Clarity Audits, you can quickly identify pages with too many internal links that could point the search engine's crawl in a wrong direction.
#4. Interlinking Helps Pass Link Juice to Ensure Bots Reach Authoritative Pages
This goes without saying, doesn’t it? The most popular content on your site has way more link juice than the rest. And so, linking to low-performance pages from those highly authoritative content will pass at least some of that credibility on to them.
But in turn, it will also suggest those pages to the bot to crawl and index.
Simply, right? Unfortunately, there’s a catch:
With thousands of pages, trying to identify, and then link to a handful of weak pages you’d like search engines to index is pretty much impossible.
Instead, you need to distribute link juice in a way so that it reaches pages deeper in the architecture.
Target top-level content such as category pages, top product pages and so on. Internal links from the homepage and other authoritative content will strengthen them, and in turn, pass link juice to pages below them in your site structure.
To apply this at scale, we offer Internal Link Analysis to find important pages that do not have internal links, optimize anchor text, even locate broken internal links (and more, of course).
Crawl budget isn’t one of the things SEOs worry about. At least, not regularly. At the same time, crawl issues can negatively affect your rankings and search visibility.
Luckily, you can improve the crawl rate to your available budget by analyzing your internal link structure, fixing roadblocks, and pointing bots to relevant sections of the site you’d like them to index more regularly.