Purchasing duplicate content
-
Morning all,
I have a client who is planning to expand their product range (online dictionary sites) to new markets and are considering the acquisition of data sets from low ranked competitors to supplement their own original data. They are quite large content sets and would mean a very high percentage of the site (hosted on a new sub domain) would be made up of duplicate content. Just to clarify, the competitor's content would stay online as well.
I need to lay out the pros and cons of taking this approach so that they can move forward knowing the full facts. As I see it, this approach would mean forgoing ranking for most of the site and would need a heavy dose of original content as well as supplementing the data on page to build around the data. My main concern would be that launching with this level of duplicate data would end up damaging the authority of the site and subsequently the overall domain.
I'd love to hear your thoughts!
-
Thanks for the great response, some really useful thoughts.
To address your final point, the site is considerably stronger than the content creator's so it's reassuring to hear that this could be the case. Of course we'll be recommending that as much of the data as possible is curated and that the pages are improved with original content/
-
Wow, this is a loaded question. The way I see it we can break this up into two parts.
First, subdomains vs. domains vs. subpages. There has been a lot of discussion surrounding which structure should be used for SEO friendliness and to keep it really simple, if you're concerned about SEO then using a subpage structure is going to be the most beneficial. If you create a separate domain, that will be duplicate content and it does impact rankings. Subdomains are a little more complex, and I don't recommend them for SEO. In some cases, Google views subdomains as spam (think of all the PBNs created with blogspot.com) and in other cases it's viewed as a separate website. By structuring something as a subdomain you're indicating that the content is different enough from the main content of the root domain that you don't feel it should be included together. An example of this being used in the wild appropriately might be different language versions of a website, which especially makes sense in countries where the TLD doesn't represent multiple languages (like Switzerland - they have four national languages).
Next, the concept of duplicate content is different depending on whether it's duplicate internally, or duplicate externally. It's common for websites to have a certain amount of duplicate or common content within their own website. The number that has been repeated for years as a "safe" threshold is 30%, which is a stat that Matt Cutts threw out there before he retired. I use siteliner.com to discover how much common content has been replicated internally. Externally, if you have the same content as another website, this can pretty dramatically impact your rankings. Google does a decent job of assigning content to the correct website (who had it first, etc.) but they have a long way to go.
If you could assimilate the new content and have the pages redirected on a 1:1 basis to the new location then it's probably safe enough to do, and hopefully you will have it structured in a way that makes it useful to users. If you can't perform the redirect, I think you're more likely to struggle with achieving SEO goals for those new pages. In that case, take the time to set realistic expectations and track something like user engagement between new and old content so you have a realistic understanding of your success and challenges.
-
I would be thinking about these topics....
** How many other companies are purchasing or have purchased this data? Is it out there on lots of sites and the number is growing?
** Since this is a low-ranking competitor, how much additional money would be required to simply buy the entire company (provided that the data is not already out there on a ton of other websites.)
** Rather than purchasing this content, what would be the cost of original authorship for just those words that produce a big bulk of the traffic. Certainly 10% of the content produces over 50% of the traffic on most reference sites.
** With knowledge that in most duplicate content situations, a significantly stronger site will crush the same content on the original publisher.... where do I sit in this comparison of power?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Duplicate Content
We have multiple collections being flagged as duplicate content - but I can't find where these duplications are coming from? The duplicate content has no introductory text, and no meta description. Please see examples:- This is the correct collection page:-
Technical SEO | | Caroline_Ardmoor
https://www.ardmoor.co.uk/collections/deerhunter This is the incorrect collection page:-
https://www.ardmoor.co.uk/collections/vendors How do I stop this incorrect page from showing?0 -
Site Crawl -> Duplicate Page Content -> Same pages showing up with duplicates that are not
These, for example: | https://im.tapclicks.com/signup.php/?utm_campaign=july15&utm_medium=organic&utm_source=blog | 1 | 2 | 29 | 2 | 200 |
Technical SEO | | writezach
| https://im.tapclicks.com/signup.php?_ga=1.145821812.1573134750.1440742418 | 1 | 1 | 25 | 2 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=blog&utm_campaign=brightpod-article | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=marketplace&utm_campaign=homepage | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=blog&utm_campaign=first-3-must-watch-videos | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?_ga=1.159789566.2132270851.1418408142 | 1 | 5 | 31 | 2 | 200 |
| https://im.tapclicks.com/signup.php/?utm_source=vocus&utm_medium=PR&utm_campaign=52release | Any suggestions/directions for fixing or should I just disregard this "High Priority" moz issue? Thank you!0 -
What is the best way to handle these duplicate page content errors?
MOZ reports these as duplicate page content errors and I'm not sure the best way to handle it. Home
Technical SEO | | ElykInnovation
http://myhjhome.com/
http://myhjhome.com/index.php Blog
http://myhjhome.com/blog/
http://myhjhome.com/blog/?author=1 Should I just create 301 redirects for these? 301 http://myhjhome.com/index.php to http://myhjhome.com/ ? 301 http://myhjhome.com/blog/?author=1 to http://myhjhome.com/ ? Or is there a better way to handle this type of duplicate page content errors? and0 -
How can something be duplicate content of itself?
Just got the new crawl report, and I have a recurring issue that comes back around every month or so, which is that a bunch of pages are reported as duplicate content for themselves. Literally the same URL: http://awesomewidgetworld.com/promotions.shtml is reporting that http://awesomewidgetworld.com/promotions.shtml is both a duplicate title, and duplicate content. Well, I would hope so! It's the same URL! Is this a crawl error? Is it a site error? Has anyone seen this before? Do I need to give more information? P.S. awesomewidgetworld is not the actual site name.
Technical SEO | | BetAmerica0 -
What to do about similar content getting penalized as duplicate?
We have hundreds of pages that are getting categorized as duplicate content because they are so similar. However, they are different content. Background is that they are names and when you click on each name it has it's own URL. What should we do? We can't canonical any of the pages because they are different names. Thank you!
Technical SEO | | bonnierSEO0 -
Block Quotes and Citations for duplicate content
I've been reading about the proper use for block quotes and citations lately, and wanted to see if I was interpreting it the right way. This is what I read: http://www.pitstopmedia.com/sem/blockquote-cite-q-tags-seo So basically my question is, if I wanted to reference Amazon or another stores product reviews, could I use the block quote and citation tags around their content so it doesn't look like duplicate content? I think it would be great for my visitors, but also to the source as I am giving them credit. It would also be a good source to link to on my products pages, as I am not competing with the manufacturer for sales. I could also do this for product information right from the manufacturer. I want to do this for a contact lens site. I'd like to use Acuvue's reviews from their website, as well as some of their product descriptions. Of course I have my own user reviews and content for each product on my website, but I think some official copy could do well. Would this be the best method? Is this how Rottentomatoes.com does it? On every movie page they have 2-3 sentences from 50 or so reviews, and not much unique content of their own. Cheers, Vinnie
Technical SEO | | vforvinnie1 -
Using robots.txt to deal with duplicate content
I have 2 sites with duplicate content issues. One is a wordpress blog. The other is a store (Pinnacle Cart). I cannot edit the canonical tag on either site. In this case, should I use robots.txt to eliminate the duplicate content?
Technical SEO | | bhsiao0 -
Duplicate content
This is just a quickie: On one of my campaigns in SEOmoz I have 151 duplicate page content issues! Ouch! On analysis the site in question has duplicated every URL with "en" e.g http://www.domainname.com/en/Fashion/Mulberry/SpringSummer-2010/ http://www.domainname.com/Fashion/Mulberry/SpringSummer-2010/ Personally my thoughts are that are rel = canonical will sort this issue, but before I ask our dev team to add this, and get various excuses why they can't I wanted to double check i am correct in my thinking? Thanks in advance for your time
Technical SEO | | Yozzer0