A canonical tag is an HTML element that informs a search engine that a group of similar or duplicate pages has a preferred version to be indexed.
In other words, if you have the same (or similar) content available under different URLs, you can use a canonical tag to specify which version Google should follow and index.
For example, if you have the two following URLs ...
example.com/c/collectibles/musicboxes.html and example.com/c/musicboxes/collectibles.html
… both of these URLs serve the same content (i.e. they both lead to the same page). By adding a canonical tag, Google will choose one of the URLs as the canonical version and crawl that. The canonical version is the preferred version. Other URLs will be viewed as duplicate content, and so will be crawled less often than the canonical version.
Setting the Canonical Tag
Canonical tags use simple and consistent syntax, and are placed within the <head> section of a web page. Here is an example of a canonical tag:
<link rel=“canonical” href=“https://example.com/abc” />
Here’s what each part of that code means:
- link rel=”canonical”: The link in this tag is the master (that is, the canonical, or preferred) version of this page.
- hreflang=”https://example.com/abc”: The canonical version can be found at this URL.
And here’s what that code example looks like on a live website:
(The rel=canonical tag within a page’s source code.)
Why Are Canonical Tags Important?
Canonical tags are key in preventing duplicate content issues. With duplicate content, it makes it harder for Google to choose:
- Which version of the page to index
- Which version of the page to rank
- Whether they should consolidate link equity on one page, or split it between the multiple versions
Implementing the canonical tag, helps to prevent duplicate content issues because the search engine knows of the preferred version of the group of pages — the canonical URL.
Too much duplicate content can also affect your crawl budget. This means Google may end up wasting time crawling multiple versions of the same page instead of discovering other important content on your website.
Here are some common reasons why your site may have duplicate, or very similar, pages:
1. Having parameterized URLs for search parameters (example.com?q=search-term).
2. Having parameterized URLs for session IDs (https://example.com?sessionid=3).
3. Having pages for different device types (example.com and m.example.com).
4. Having AMP and non-AMP versions of a page (example.com/page and amp.example/page).
5. Serving the same content at non-www and www variants (https://example.com and https://www.example.com).
6. Serving the same content at non-https and https variants (http://www.example.com and https://www.example.com).
7. Serving the same content with and without trailing slashes (https://example.com/page/ and https://www.example.com/page).
8. Serving the same content at default versions of the page such as index pages (https://www.example.com/, https://www.example.com/index.htm, https://www.example.com/index.html, https://www.example.com/index.php, https://www.example.com/default.htm, etc.).
9. Serving the same content with and without capital letters (https://example.com/page/ and https://www.example.com/Page/).
Best Practices for Canonical Tags
To avoid any duplicate content issues, it’s important to implement the canonical tag correctly. Below is a list of best practices for using the canonical tag.
1. Use absolute URLs.
While you can add a relative URL, it is best practice here to use the absolute URL. This way, it will be interpreted correctly.
Use the following structure to adhere to the absolute URL approach ...
<link rel=“canonical” href=“https://example.com/abc/” />
As opposed to …
<link rel=“canonical” href=”/abc/” />
2. Use lowercase in your URLs.
Google may view upper and lowercase URLs as two different URLs even if they point to the same destination. For this reason, force lowercase URLs on your server first, and then use lowercase URLs for your canonical tags as well.
3. Use the correct domain version (i.e. HTTPS vs. HTTP).
If your site has been switched over to SSL (Secure Sockets Layer), or HTTPS, then don’t declare any non-SSL (that would be HTTP) URLs in your canonical tags.
4. Use self-referential canonical tags.
Regardless of if you have duplicate content or not, Google’s John Mueller says that while not mandatory, self-referential canonical tags are recommended.
5. Use one canonical tag per page.
If the page has multiple canonical tags, then Google will ignore both. This leads the search engine to select which page is the preferred page.
6. Canonicalize cross-domain duplicates.
If you control both sites, you can use the canonical tag across domains.
Common Canonical Tag Issues
Be aware of the following common issues when implementing your canonical tags.
1. Canonical points to a noindex URL.
This means that the URL that is included within the canonical tag is noindex (e.g. Page A canonicalizes to Page B, which is noindex).
2. Canonical outside of the page’s <head>.
This means that the URL in question has a canonical tag element specified in the HTML, but outside of the <head>.
3. Canonical is malformed or empty.
This means that the URL in question has a canonical tag element specified, but the canonical URL is missing or invalid. If search engines encounter malformed or empty canonical tag, they will ignore the canonical instruction entirely.
4. Canonical loop.
This means that the URL in question has a canonical tag whereby the canonical URL is actually canonicalized back to the original URL (e.g. Page A is canonicalized to Page B, which is then canonicalized back to Page A).
5. Canonical points to a URL with an error message.
This means that the URL in question has a canonical tag whereby the canonical URL itself is resolved in an error (i.e. 5XX or 404).
6. Canonical points to a redirecting URL.
This means that the URL in question has a canonical tag whereby the canonical URL itself is redirected (e.g. Page A is canonicalized to Page B, which redirects to Page C).
7. Canonical points to another canonicalized URL.
This means that the URL in question has a canonical element whereby the canonical URL itself is also canonicalized (e.g. Page A is canonicalized to Page B, which is then canonicalized to Page C).
8. Mismatched canonical tag in HTML and HTTP header.
This means that the URL in question has a canonical element specified both in the HTML and in the HTTP header, where the canonical URLs differ.
9. Multiple, mismatched canonical tags.
This means that the URL in question has a canonical element specified in multiple locations (either in the HTML, in the HTTP header, or in a combination of both), and that the canonical URLs specified are not the same.
10. Canonical only found in rendered DOM.
This means that the URL in question has a canonical element which is only present in the rendered DOM, and is not present in the source HTML.
11. Rendered canonical is different to HTML source.
This means that the URL in question has a canonical element in the rendered DOM, which is different to the one in the source HTML.
12. Canonical tag in HTML and HTTP header.
This means that the URL in question has a canonical tag specified both in the HTML and in the HTTP header. It is considered best practice to use only one method to specify canonicals.