For anyone involved in content creation, it's important to consider the implications that duplicate content can have on your site.
While this seems like a straightforward principle on the surface, there are actually quite a few considerations to address when it comes to duplicate content and SEO.
This piece addresses everything you need to consider in addressing duplicate content, when it should be avoided or allowed, and how to address it at scale using an SEO platform.
What is Duplicate Content?
Basically, duplicate content is anything that's not original. That said, not all content needs to be original. Some duplicate content actually serves a useful function. At the same time, it's important that you're aware when you're using duplicate (as opposed to original) content. Here are some common types of duplicate content:
- Content that is simply taken from one site and republished on another. If this is done without permission, it's also plagiarism, which we'll discuss below. Even if you have permission, the content is still duplicate.
- On-site duplication. Items you publish yourself across multiple pages. In some cases you may do this deliberately, like publishing a blog post or video on one page of your website as well as others, but these do constitute duplicate pages. Longer content pieces like blog posts aren't the only thing that can be duplicate: Similar or identical meta descriptions are also duplicative.
- Content that's not identical to but very similar to another piece. This is where it gets tricky, as you may consider something unique if it's not a direct match to another item. But, for example, if you paraphrase a source using very similar language, you may be slipping into the duplicate content category. Another scenario is an ecommerce site that sells products with many variations. In this case, the wording might be almost identical except for a few details.
- Syndicated or curated content. This is content deliberately republished from another source, with credit and permission (so it's not plagiarized content). Some websites use content syndication as their primary model. You may republish something from another site or you may give permission to another site to publish your content.
Looking beyond content itself, there are also technical issues that can slyly generate duplicate content:
- URL Parameters. These occur when applying URL parameters or tracking in your code. According to Google, these URL variations are made by creating a key and a value separated by an equals sign and joined by an ampersand. So, while the URLs may appear different, the user arrives at the same page no matter the link they click.
- Session IDs. Similar to applied URL parameters, session IDs occur when a different ID within the URL is given to every user that visits your site.
- Different Versions of Your Site. This applies to sites that have both a www.example.com and example.com version of their pages. This also occurs for sites that have an SSL certificate and maintain both the HTTP and HTTPS versions of their site.
- Faceted Navigation. Faceted, or filtered navigation, helps users filter details on your site to discover the information they seek, allowing them to customize their search experience. The search engine, however, may consider these filtered URL results to be duplicate content.
Recommended Reading: Pagination vs. Infinite Scroll: What's the Difference?
All of these issues, whether apparent in the content or hidden within your URL, have the potential to impact your traffic to your target pages.
Is Duplicate Content Plagiarism?
Plagiarism and duplicate content overlap but have different meanings. Plagiarism is a legal term involving the theft of intellectual property. If you publish something created by someone else without permission, you can be liable for legal and civil penalties. This is why there are plenty of tools that can scan for and check plagiarism.
Duplicate content, on the other hand, is an SEO issue, not a legal issue. Google can't fine or imprison you, but its algorithm does determine your site's ability to rank. Google rewards content that it regards as useful and authoritative. The more original your content is, the more likely it is to fit these criteria.
What Qualifies as Duplicate Content?
Google doesn't give a mathematically precise definition of duplicate content, only saying that it refers to content that's identical or "appreciably similar" to other content.
There are marketers and website owners who rewrite or "spin" articles to make them appear unique, but this is not the best practice for several reasons — not the least of which is that it doesn't help you build your own brand.
Article Spinning and Measuring Uniqueness
Some people rewrite or "spin" articles to make them original. This can either be done manually or with the help of software. AI-powered article spinners are usually programmed to make content at least 80% unique.
This approach, however, can be quite misleading. For one thing, uniqueness certainly doesn't equate to high quality. Article spinning software generally produces gibberish or, at best, barely passable articles.
Furthermore, when it comes to plagiarism, you're not allowed to use even a single sentence of someone else's work without permission. An article could be 98% unique and still plagiarized. It's best to aim for 100% uniqueness.
Rewriting or spinning content is not a winning strategy if you're aiming to brand yourself as an authority in your space. Plus, with those plagiarism checks, it's best to err on the side of cation and avoid any legal issues.
Is Duplicating Content Bad for SEO?
There are varying opinions on when duplicate content presents an SEO risk. Here are considerations to keep in mind.
The SEO Risks of Duplicate Content
So, just how bad is duplicate content? There are a few potential disadvantages:
1. When there are multiple versions of very similar content, Google will usually only display one.
So, everything else being equal, it's harder to rank for content that's not distinctive. If you're competing with larger, more established sites, your own content is likely to be overlooked in the search results.
2. It reduces the effectiveness of backlinks.
If the identical content can be found at different URLs, you're competing with each of these sites for backlinks.
3. Your audience is less likely to click on your site.
Even if multiple versions of the same (or a very similar-sounding) item are displayed in search results, users are likely to choose one and ignore the other. When your content stands out, it's more likely people will click on it, as they can expect something new.
4. Duplicate content on your own website can be confusing to Google.
5. It dilutes your brand.
Unless your business is built on the curation model, it's advantageous to brand yourself with unique content that reflects your own ideas and style.
Not Everything Has to be Original
SEO experts mostly agree that it's fine to publish a certain amount of non-original content. Many authority websites regularly repost articles using the content syndication model. Some sites, such as Huffington Post and Buzzfeed, are built on syndicating articles from other sites.
That said, these examples are highly successful, well-funded companies that have perfected the curation model.
Are There Duplicate Content Penalties?
Some SEO experts say that the duplicate content penalty is a myth. However, even if there is no direct content penalty from Google, you're still facing the above drawbacks.
You can also think of it the other way — there are undeniable SEO advantages to publishing unique content.
How to Prevent Duplicate Content
Once you're aware of the issue, it's not difficult to avoid publishing content that's not unique.
Focus on Original Work
There are several factors to keep in mind in order to ensure that your content is original.
- If you outsource content creation, make sure you're using a trustworthy source. There are cases in which freelancers will send you "original" articles that turn out to be plagiarized.
- When doing research, be careful about quoting or paraphrasing. Sometimes, you can unconsciously copy someone else's words. Quoting people is fine, but if you overdo it, this will detract from the uniqueness. When researching, use multiple sources.
- If you want to include information from other websites, you can always insert a link rather than quoting long passages or asking permission to republish the whole piece.
How to Check for Duplicate Content
There are several ways to identify duplicate content.
- Use a plagiarism checker such as Copyscape to test the uniqueness of your content.
- Place text into the Google search bar. This is a good way to spot duplicate content within your own pages and also lets you know if anyone else has taken your content without authorization.
- Use an advanced tool such as the Content Similarity Checker (Simcheck), one of the SEO content analysis tools offered by seoClarity. This tool allows you to compare the content between web pages to check for similarity by simply uploading an XLS, CSV, or TXT file. This provides a simple way to get a precise summary of similarity percentages between pages. While you can perform manual searches to identify duplicate content, using an SEO platform makes the process scalable for content teams that need to check large numbers of pages.
For seoClarity Clients, navigate here to Content Similarity Checker in the platform.
Fix Duplicate Content With Tags
There are several actions you can take to reduce duplicate content. Two of the best methods are 301 redirects and canonical tags. These instruct the search engines to index a specific URL.
- 301 redirects. When users can reach your site using different URLs, a 301 redirect will send traffic to the URL you prefer. This ensures that Google doesn't get confused and consider the separate URLs as two distinct websites.
- Canonical tags. A canonical URL is another name for your preferred URL, the one you want the search engines to rank. A Rel=Canonical tag is particularly useful when you have product pages with many variations that create distinct URLs. For example, if you sell shirts that come in different sizes and colors, Google may assume that each variation is a separate page if you don't specify canonical tags.
- Noindex tags. Use the noindex tag if you don't want a certain page to appear in search results.
Recommended Reading: 12 Common Hreflang Mistakes and How to Prevent Them
Edit Your Content
It's a good idea to perform an SEO content analysis for all your content. This will help you identify not only duplicate content but any other SEO issues that need addressing. You can fix certain issues with the above-mentioned tags.
In some cases, you may want to make updates and optimizations to your content. You can remove pages that are too similar to others — some pages may be redundant and you can delete them. Others may be consolidated.
If certain pages are similar, you can make changes to create uniqueness. You might focus on different long-tail keywords so different pieces discuss similar topics using distinct language. If you have pages that are very similar (e.g. product pages, pages for different locations), it's worth taking the time to reword them.
For example, a real estate company may have many pages for different locations, with only the location names changed. You could change the wording to make the entire page unique.
Stop Content Theft
Content thieves, also known as content scrapers, are unscrupulous publishers who simply steal others' content and hope they don't get caught. On the vast web, this practice is quite common. You don't want your content republished on random, low-quality sites. That's why it's important to monitor the internet for content scraping …
- Use Google Alerts to monitor the web for mentions of your brand or products.
- If you use WordPress, you can use a plugin such as Copyright Proof to prevent theft of your images.
- Use a plagiarism detector to check for anyone copying content from your website. Copyscape offers a paid service that sends out plagiarism alerts.
- If you are aware of someone stealing your content, file a DMCA takedown notice.
Be Aware of Duplicate Content
As a content creator, you should know as much about your content as possible. This includes understanding the difference between duplicate and original content, and the implications both bring to SEO.
As we discussed, this doesn't mean that everything has to be unique. You may want to republish or curate certain items. In order to brand yourself and rank better in the search engines, however, it's to your advantage to publish mostly unique content.