How do I use public content without being penalized for duplication?
-
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized?
Thanks,
Ruben
-
I didn't think about other sites, but that's a fabulous point. Best to play it safe.
Thanks for your input!
- Ruben
-
My gut says that your idea to keep the content noindexed is best. Even if the content is unique it borders on the area of auto-generated. Now, I might change your mind if there were a lot of users that were actively interacting with most of these pages. If not though, then you'll end up having a large portion of your site consisting of auto-generated content that doesn't seem useful. Plus...it's also possible that other sites are using this information so you would end up having content that is duplicated on other sites too.
I could be wrong, but my gut says not to try to use this content for ranking purposes.
-
I appreciate the follow up, Marie. Please give me your thoughts on the following idea;
The NHTSA only posts the updates for the past month. If I noindex the page for now (which is what I'm doing) and wait five months, then what would happen? At that point, yes, the current month would be duplicated, but I'd have four months of "unique" content because the NHTSA deletes there's. Plus, I could add pictures of all the automobile, too. Do you think that would be enough to index it?
(I'm most likely going to keep it noindex, because this borders on shady, or at least, I could see google taking it that way, but just as a thought experiment, what do you think?) Or anyone else?
Thanks,
Ruben
-
To expand on EGOL's answer, if you are taking someone else's content (even with their permission) and wanting Google to index it then Google can see that you have a large amount of copied content on your site. This can trigger the Panda filter and can cause Google to consider your whole site as low quality.
You can add a noindex tag as EGOL suggested or you could use a canonical tag to show Google who the originator of the content is, but probably the noindex tag is easiest.
There is one other option as well. If you think it is possible that you can add significant value to the content that is being provided then you can still keep it indexed. If you can combine the recall information with other valuable information then that might be ok to index. But, you have to truly be providing value, not just padding the page with words to make it look unique.
-
Alright, that sounds good. Thanks!
-
I had a bunch of republished articles on my website, done mostly at the request of government agencies and universities. That site got hit in one of the early Panda updates. So, I deleted a lot of that content and to the rest I added this line above the tag
name="robots" content="noindex, follow" />
That tells google not to index the page but to follow the links and allow pagerank to flow through. My site recovered a few weeks later.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to gain or build high authority backlinks without content?
P.S. please suggest me the latest tips and tricks along with the known and lesser known facts regarding the niche question.
Intermediate & Advanced SEO | | HuptechWebseo0 -
Is this the correct way of using rel canonical, next and prev for paginated content?
Hello Moz fellows, a while ago (3-4 years ago) we setup our e-commerce website category pages to apply what Google suggested to correctly handle pagination. We added rel "canonicals", rel "next" and "prev" as follows: On page 1: On page 2: On page 3: And so on, until the last page is reached: Do you think everything we have been doing is correct? I have doubts on the way we have handled the canonical tag, so, any help to confirm that is very appreciated! Thank you in advance to everyone.
Intermediate & Advanced SEO | | fablau0 -
Sites in multiple countries using same content question
Hey Moz, I am looking to target international audiences. But I may have duplicate content. For example, I have article 123 on each domain listed below. Will each content rank separately (in US and UK and Canada) because of the domain? The idea is to rank well in several different countries. But should I never have an article duplicated? Should we start from ground up creating articles per country? Some articles may apply to both! I guess this whole duplicate content thing is quite confusing to me. I understand that I can submit to GWT and do geographic location and add rel=alternate tag but will that allow all of them to rank separately? www.example.com www.example.co.uk www.example.ca Please help and thanks so much! Cole
Intermediate & Advanced SEO | | ColeLusby0 -
How to Fix Duplicate Page Content?
Our latest SEOmoz crawl reports 1138 instances of "duplicate page content." I have long been aware that our duplicate page content is likely a major reason Google has de-valued our Web store. Our duplicate page content is the result of the following: 1. We sell audio books and use the publisher's description (narrative) of the title. Google is likely recognizing the publisher as the owner / author of the description and our description as duplicate content. 2. Many audio book titles are published in more than one format (abridged, unabridged CD, and/or unabridged MP3) by the same publisher so the basic description on our site would be the same at our Web store for each format = more duplicate content at our Web store. Here's are two examples (one abridged, one unabridged) of one title at our Web store. Kill Shot - abridged Kill Shot - unabridged How much would the body content of one of the above pages have to change so that a SEOmoz crawl does NOT say the content is duplicate?
Intermediate & Advanced SEO | | lbohen0 -
Does having a page that ends with ? cause duplicate content?
I am working on a site that has lots of dynamic parameters. So lets say we have www.example.com/page?parameter=1 When the page has no parameters you can still end up at www.example.com/page? Should I redirect this to www.example.com/page/ ? Im not sure if Google ignores this, or if these pages need to be dealt with. Thanks
Intermediate & Advanced SEO | | MarloSchneider0 -
Duplicate Content Question
Brief question - SEOMOZ is teling me that i have duplicate content on the following two pages http://www.passportsandvisas.com/visas/ and http://www.passportsandvisas.com/visas/index.asp The default page for the /visas/ directory is index.asp - so it effectively the same page - but apparently SEOMOZ and more importantly Google, etc treat these as two different pages. I read about 301 redirects etc, but in this case there aren't two physical HTML pages - so how do I fix this?
Intermediate & Advanced SEO | | santiago230 -
Is this duplicate content something to be concerned about?
On the 20th February a site I work on took a nose-dive for the main terms I target. Unfortunately I can't provide the url for this site. All links have been developed organically so I have ruled this out as something which could've had an impact. During the past 4 months I've cleaned up all WMT errors and applied appropriate redirects wherever applicable. During this process I noticed that mydomainname.net contained identical content to the main mydomainname.com site. Upon discovering this problem I 301 redirected all .net content to the main .com site. Nothing has changed in terms of rankings since doing this about 3 months ago. I also found paragraphs of duplicate content on other sites (competitors in different countries). Although entire pages haven't been copied there is still enough content to highlight similarities. As this content was written from scratch and Google would've seen this within it's crawl and index process I wanted to get peoples thoughts as to whether this is something I should be concerned about? Many thanks in advance.
Intermediate & Advanced SEO | | bfrl0 -
How to deal with category browsing and duplicate content
On an ecommerce site there are typically a lot of pages that may appear to be duplications due to category browse results where the only difference may be the sorting by price or number of products per page. How best to deal with this? Add nofollow to the sorting links? Set canonical values that ignore these variables? Set cononical values that match the category home page? Is this even a possible problem with Panda or spiders in general?
Intermediate & Advanced SEO | | IanTheScot0