For enterprises that manage sites for visitors in multiple languages and countries, it can become quite cumbersome to inform search engines which version of your page is intended for which audience.
Thankfully, that’s where hreflang tags come in, which can be added in meta data at the page level or within an XML sitemap.
As a review, an XML sitemap lists the site’s pages and key information about them for Google and other search engines. The sitemaps should be linked from the robots.txt file and submitted in Google Search Console.
We covered in an earlier post how to leverage the web crawler within the seoClarity platform to create XML sitemaps, so in this post we’ll pick up where we left off to consider multi-language and regions sites.
To solve the challenge above, I’m going to show you how to create an hreflang XML sitemap for large enterprise sites.
Ideally, your content management systems include sitemaps that automatically update as changes are made to your site. However, if your CMS doesn’t automatically update your sitemap, you can use these steps in the short term. Then working with your configuration or development team you can work to implement automatic sitemaps over time.
4 Steps to Create an Hreflang XML Sitemap
1. Start with your Current Sitemap of the Primary Site
If you have a current sitemap, start there. Crawl the URLs with a web crawler and download the Excel file to build the list of URLs to be included in your sitemap that will include the distinct language and location URLs.
Remove any URLs that are not a 200-OK status code such as 404s or redirecting URLs.
For redirecting URLs, replace them with the final destination URLs if they are not already included on the list.
2. Find Your Indexable Pages
Next, you’ll want to identify the other indexable URLs out there to include in your list of URLs for the XML sitemap.
These pages could come from a crawl of your primary website (e.g. the one targeting English in the United States), or they could be provided by your web team.
The key is to get these pages crawled in an SEO platform to confirm that they are “indexable” URLs.
Indexable URLs show a 200-OK status code and don’t have a canonical tag pointing to another page.
In the example below, a set of URLs was provided to create an XML sitemap by a site’s development team, but 52 of them actually redirect. This is a due diligence step before including them in the XML sitemap.
If you suspect you have URLs that are not internally linked, and so do not appear in a crawl, you could leverage an SEO keyword database to find URLs indexed and ranking in Google.
Additionally, you can export all of the indexed pages within Google Search Console to collect all the URLs hosted on your site.
With enterprise sites, there can be multiple content management systems that generate pages, as well as one-off pages added to the site over time. This collection step ensures you’re capturing as many URLs as possible to include in your sitemap.
After the crawl, you need to replace these redirecting URLs with the final destination URL version, or remove them from the list.
Then, re-crawl to find your indexable list. All 200s, all pages you want gaining traffic.
For sites without hreflang tags, you can download XML sitemaps from the platform.
(These download in the .gz format, which compresses them, but you can extract them after downloading to access the .xml format.)
Then add to your site, and submit in Google Search Console.
3. Set Up Your Hreflang Structure
Hreflang tags are a cornerstone for international SEO.
There are multiple ways to add hreflang tags to your site: Adding them via an XML sitemap is a good first choice because you don’t have to add any code to individual pages.
The first part of this step is to indicate whether or not you indeed have a matching URL in your targeted multiple language and location.
How do you do this?
- Gather the international URLs.
If your site follows the same URL structure (e.g. /en-us/widgets and /en-gb/widgets) you can use an Excel formula to generate the other URLs from your primary site.
For example, you could use the concatenate function to build the URLs by combining:
“https://www.domain.com” “/en-us” “/widgets”
“https://www.domain.com” “/en-gb” “/widgets”
If your site does not follow a uniform structure across your unique language and regional pages, then you'll have to find another way to align pages. There should be some pattern in the URL to tip you off.
If the hreflang tags are within the code of the site, seoClarity users can pull that information to find the pages. This would assume you’re deprecating this method of using hreflang tags in HTML page tags.
Ideally you could export a full list of pages from your site’s CMS to do the alignment.
- Align your international sites across columns in an Excel spreadsheet and the intended hreflang targets.
- Generate your XML sitemap with an XML Sitemap Generator (for hreflang tags).
This will input your site’s international pages aligned with each country.
Recommended Reading: 12 Common Hreflang Mistakes and How to Prevent Them
4. Test Your XML Sitemap
If you have access to your own site (or a developer site) you can test your XML sitemap without the help of a developer.
The test will ensure that your sitemap is in the right format and give you the chance to get it up and running.
Bluehost is a popular hosting platform, so let me show you how to test your XML sitemap within this platform (although any hosting platform will do!).
In Bluehost, choose “File Manager” then the “Public HTML” section.
Then choose “Upload” and add your .XML file. On your test site you’ll now be able to view it when you navigate to the file name right off the root domain.
For example: your-test-site.com/sitemap.xml.
Now you can see your XML file in all it’s glory on the web:
You can validate your hreflang sitemap at scale to check your work with web crawler and site audit technology built into an SEO platform, or with free online tools.
This is all to make sure you didn’t miss a space, a quote or anything else that would make the sitemap invalid that would be discovered after you go to your dev team to upload to your actual site. You also want to remove any redirecting URLs, error URLs or URLS not actually translated in the associated language.
Now that you have an hreflang sitemap, you can upload to your site, submit to Google Search Console, and check it off the list.
Recommended Reading: Does Your Site Need Self-Referencing Hreflang Tags? Hint: It Does!
Creating sitemaps is a labor of love, but if your site needs this then there’s no alternative. Collaborate with your development team so they can add an automated sitemap for your site — and have it include the last modified values too!