Duplicate Content Penalty vs. Duplicate Content Filters – The Truth Revealed
There is a lot of confusion over duplicate content penalties and duplicate content filters. Not many know the difference between the two or whether their belief in them is in fact justified.
So let’s explore the difference between the two and throw some light on whether they are real, and if so the effect on search engine rankings.
The Duplicate Content Penalty
A duplicate content penalty is thought to be a punishment for publishing duplicate content on a website or blog which can lead to the website or blog domain being de-indexed and banned from the search engines.
In response, Susan Moskwa of Google had this to say: “There’s no such thing as a ‘duplicate content penalty.’ At least, not in the way most people mean when they say that.”[i]
However Google does say this: “Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”[iii]
They also say this: “In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”[iii]
This is perhaps how the Duplicate Content Penalty myth was born: It’s not entirely improbable that a couple of black hat sites got de-indexed and a few forum lurkers whose sites weren’t ranking well read a conversation between the black hatters in which they casually mentioned duplicate content. Our desperate lurkers who were frantically trying to find a scapegoat for their poor rankings put two and two together and made five!
How the duplicate content penalty belief was started is not important, what is important is that it doesn’t exist for normal genuinely useful websites. The only time Google will take action is when they think a site is being deceptive and manipulating the search results (which covers a multitude of “sins” and not just duplicate content abuse).
Google gives no indication as to what is meant by “to be deceptive and manipulate search engine results” but they do state this in their quality guidelines: “A good rule of thumb is whether you’d feel comfortable explaining what you’ve done to a website that competes with you. Another useful test is to ask, “Does this help my users? Would I do this if search engines didn’t exist?”[iv]
It would appear that Google is not worried about the content itself (or its origin), but they are concerned about how the content is used. Sven Naumann of the Google Search Quality Team has this to say: “I’d like to point out that in the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index. It simply gets filtered out.”[ii]
Duplicate Content Filters
A duplicate content filter is a mechanism used by search engines to filter all the results for a particular search phrase which only displays single results from all the duplicates (which they determine are the most relevant).
According to Susan Moskwa, “Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy.”[i]
In a nutshell; out of a group of near identical pages, Google will display the webpage they think is the most relevant to the search phrase.
Duplicate Content Filters and Search engine Rankings
It is easy to blame the use of duplicate content for poor rankings, but in answer to this here’s what Susan Moskwa had to say: “We sometimes hear from Amazon.com affiliates who are having a hard time ranking for content that originates solely from Amazon. Is this because Google wants to stop them from trying to sell Everyone Poops? No; it’s because how the heck are they going to outrank Amazon if they’re providing the exact same listing? Amazon has a lot of online business authority (most likely more than a typical Amazon affiliate site does), and the average Google search user probably wants the original information on Amazon, unless the affiliate site has added a significant amount of additional value.”[i]
There are two key phrases in her statement:
1) “has a lot of online business authority” – That is the criteria that determines which of two identical pages will be displayed in the search results. It has nothing to do with originality, the page with the most authority wins.
This is certainly true of identical pages within your website. For example, a WordPress blog category page and tag page can be virtually identical, but because they have so many internal links pointing to them they will be the pages displayed in a search result, and not the actual blog posts.
2) “the average Google search user probably wants the original information on Amazon, unless the affiliate site has added a significant amount of additional value.” – According to this, adding additional value to duplicate content will avoid the duplicate content filter and get a page listed on the same results page as those of similar content.
Conclusion: Duplicate Content and You
In the final analysis the source of your content is not that important, it’s how you use it:
- The content must be valuable to your readers.
- You must give credit to the original authors.
- Your own contribution must add value to the original.
Unless you are engaging in some seriously questionable tactics, don’t worry about using duplicate content. Google actually states: “identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines.”[ii]
However to give your pages the best chance of appearing in the search listings, here are four pointers to avoid your duplicate pages being filtered out:
- If you are using 100% duplicate content on a website which is genuinely useful to visitors, get as many back links to your page as possible – in other words beat your competition out of the listing.
- Vary the link text in the back links pointing to your duplicate pages so your pages are associated with other search phrases (they must be relevant though).
- If you’re uncomfortable using 100% duplicate content, make changes to the content (including the paragraph structure, sentences, punctuation etc.) and include additional keywords, then get back links which use the additional keywords as the link text.
- Use a robots.txt file to make sure only one version of your content is available for the search engine s to spider – that way you don’t compete with yourself for the listing. This is important for blogs – you want your blog posts to be indexed and not RSS feeds, category or tag pages (unless you add sticky content to these pages so they are individual pages in their own right and won’t get caught in a duplicate content filter – but bear in mind they will compete with your individual blog posts).
References:
[i] Susan Moskwa Demystifying the “duplicate content penalty” – Retrieved 15 May 2009 from:
http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html
[ii] Sven Naumann Duplicate content due to scrapers – Retrieved 15 May 2009 from:
http://googlewebmastercentral.blogspot.com/2008/06/duplicate-content-due-to-scrapers.html
[iii]Duplicate content – Retrieved 15 May 2009 from:
http://www.google.com/support/webmasters/bin/answer.py?answer=66359
[iv] Webmaster guidelines – Retrieved 15 May 2009 from:
http://www.google.com/support/webmasters/bin/answer.py?answer=35769
Filed under 


May 17th, 2009 at 5:30 pm
Thanks Colin for an informative post.
There’s so many myths on the internet it needs someone to clarify points like this,
Keith
May 25th, 2009 at 9:19 pm
Good to hear you point that out. Dup content myths cause allot of anxiety over small details.
RSS2MYSQL is a good way to avoid rss dup content worries with rss aggregation… but even better it just enriches your keyword content.
May 25th, 2009 at 9:39 pm
RSS2MYSQL looks interesting – It’s given me an idea for mashing all my own feeds to create unique feeds – thanks
May 28th, 2009 at 1:51 pm
Finally, someone has cleared this up! I had become so frustrated with the overabundance of “rumors” on this subject. This is one of the most useful posts I’ve read in awhile, Colin.
May 30th, 2009 at 9:47 am
[...] Re: Pre-made "Turnkey" websites I have made money from every website I have uploaded – the ones which don’t make any money are those still on my computer… As for the duplicate conent issue – read this:Duplicate Content Penalty vs. Duplicate Content Filters – The Truth Revealed [...]
June 2nd, 2009 at 11:47 pm
[...] a duplicate content penalty on the web for years. Even in spite of Google’s efforts (and the efforts of many bloggers) to explain what “duplicate penalty” means, marketers with a duplicate content remedy [...]
June 19th, 2009 at 8:08 am
[...] [...]
June 24th, 2009 at 12:14 pm
[...] [...]