Creating an XML sitemap is one of the most fundamental tasks of any SEO strategy.
It is also one of those aspects that often gets overlooked, particularly when it comes to using sitemaps to solve technical SEO issues.
In this guide, you’ll learn everything you need to know about sitemaps, from the foundational basics every SEO should know to the process of creating and submitting sitemaps to Google, as well as using sitemaps to further your site’s search visibility.
What Is a Sitemap?
A sitemap is nothing else but a simple XML file containing a list of your website’s most important content (including video and images).
Notice that I didn’t say ALL the content. That’s because contrary to a common belief, a sitemap does not need to list every single page you have on the site. It should, however, include information about any content that you want to show up in the search results.
This is an important distinction to remember. Following this rule will help you create a sitemap that works for your SEO benefit, rather than against it.
Why Do You Need a Sitemap?
The simplest answer is because Google uses it to discover and access the most important pages on the site.
Search engines, typically, find and crawl pages on the site following internal links.
Google’s Gary Illyes has confirmed that to be true somewhat recently in a tweet.
However, not all pages will be interlinked well enough for the Googlebot to find them.
Many landing pages, for instance, may exist as independent entities, not interlinked with any other content.
Other pages might be buried too deep within the site’s architecture for the bot to reach them within the available crawl budget.
A sitemap gives you the ability to tell Google which pages to access and prioritize in the crawl.
But that will work only if you follow the rule above.
Listing all pages, including those you don’t need to see in SERPs (and most likely have blocked from being indexed in the Robots.txt file anyway) only clutters the roadmap.
Keeping the sitemap clean, listing only the assets you want to rank will make it much easier for Google to use.
Recommended Reading: The Best SEO Audit Checklist to Boost Search Visibility and Rankings
Sitemaps Can Also Help You Improve SEO Strategies
For one, a sitemap will help you monitor the most important content on the site.
Sitemaps listing pages on specific sections of the site only, like product pages, for example, let you run highly-targeted site audits and uncover problems at a granular level.
You can use sitemaps to evaluate the health of specific content types or sections of the site, and spot issues that you’d otherwise miss, if reviewed in a full site crawl.
An example of a crawler set up to audit a specific sitemap only.
Sitemap Format and Requirements
Sitemap uses an XML format to include the data. A typical sitemap file looks like this:
Note that, in spite of all the code, you can clearly see specific page URLs and associated data, like when the page was last modified.
The <url> section contains all that information.
- The <loc> tag specifies the URL. It is also the only required tag in the sitemap.
- <lastmod> contains the date of the page’s last modification.
- <priority> specifies the priority of this URL against other pages on the site. Values range from 1.0 to 0.0. Note how the homepage’s priority is the highest, while pages further in the architecture have the priority relevant to how deep they are in the site’s structure.
Sitemaps have some limitations, though:
- The file can be no larger than 50MB
- It also cannot contain more than 50000 URLs (whichever of the two comes first.)
- It has to be encoded with UTF-8 encryption
- It should contain only pages with the 200 (OK) status code.
An important note about the sitemap’s size limits.
As you’ve seen from the limitations above, the sitemap cannot contain more than 50 thousand URLs. Although this is enough for a typical site, it’s far too little for a typical enterprise website.
What to do, then?
As there is no restriction as to how many sitemaps you can create, the simplest and most practical solution is to have more than a single sitemap. It is also one of the most effective ways to monitor individual site sections, as you’ve seen above too.
When creating multiple sitemaps, make sure to also create a sitemap index - A single file listing all the sitemaps on the site.
You can submit just the sitemap index file to Google, and the search engine will use it to access all the individual sitemaps you’ve created.
How to Create a Sitemap
The process will largely depend on the technology used on your site. In most cases, however, you can create a sitemap directly from your Content Management System.
Most CMS’ like Wordpress or Joomla offer dedicated plugins for creating and updating the sitemap every time a page is created or updated.
(XML Sitemaps option in a WordPress plugin)
More advanced plugins will also automatically create individual sitemaps for each website section.
If your CMS doesn’t offer that functionality, you can use a crawler to create the sitemap as well.
A crawler collects all the information about pages on the site as it goes through them. That information is stored in a database, and the crawler can output it in the sitemap’s XML format.
Here’s how the option looks like in seoClarity’s crawler, Clarity Audits:
With Clarity Audits, you can crawl the entire site and build a sitemap from that information. Or you can set it up to access specific sections only to create individual sitemaps.
But the crawler has a few other tricks up its sleeve.
- First of all, Clarity Audits doesn’t run from your server, and does not use up your resources.
- Because it runs from an external server, it works just like the Googlebot (as opposed to a crawler running on your server.)
- It does not limit the crawl speed or resources and can crawl your site using multiple droplets simultaneously.
- And it creates the sitemap dynamically as it goes through your site.
Want to find out more about Clarity Audits?
Book a demo to see the crawler in action.
How to Submit the Sitemap to Google
Google can find the sitemap on its own. However, it is a good practice to submit it to Google, and let it process the data.
You do it in Google Search Console.
- First, upload your sitemap’s XML file(s) to the root directory of your server. The file should be accessible by going to https://domain.com/sitemap-index.xml (note, your sitemap’s filename might be different.)
- Log in to Google Search Console and go to Sitemaps (in the left sidebar)
- Type in your sitemap’s filename in the section at the top of the screen. This will submit the sitemap to Google.
Below the box, you will see your all submitted sitemaps with information about when Google has accessed them recently, and the status of the file.
With the sitemap created and submitted, all that is left to do is to monitor it for any potential errors in Google Search Console. Also, if you’ve created the sitemap manually and not through the CMS, make sure to recreate and upload it to the server regularly to ensure that any new pages have been included.