Duplicate content mess
-
One website I'm working with keeps a HTML archive of content from various magazines they publish. Some articles were repeated across different magazines, sometimes up to 5 times. These articles were also used as content elsewhere on the same website, resulting in up to 10 duplicates of the same article on one website.
With regards to the 5 that are duplicates but not contained in the magazine, I can delete (resulting in 404) all but the highest value of each (most don't have any external links). There are hundreds of occurrences of this and it seems unfeasible to 301 or noindex them.
After seeing how their system works I can canonical the remaining duplicate that isn't contained in the magazine to the corresponding original magazine version - but I can't canonical any of the other versions in the magazines to the original. I can't delete the other duplicates as they're part of the content of a particular issue of a magazine. The best thing I can think of doing is adding a link in the magazine duplicates to the original article, something along the lines of "This article originally appeared in...", though I get the impression the client wouldn't want to reveal that they used to share so much content across different magazines.
The duplicate pages across the different magazines do differ slightly as a result of the different Contents menu for each magazine.
Do you think it's a case of what I'm doing will be better than how it was, or is there something further I can do? Is adding the links enough?
Thanks.
-
You're right about the 301s, and noindex would be a massive task that I'm not sure is worthwhile. Also I'm not sure if I want to list hundreds of pages in robots.txt.
By "back to back" do you mean "compare link metrics"? A lot of these pages show as "No Data Available for this URL" some of them are quite deep down within the site, so I don't know if that's why or if Mozscape can tell that they're duplicate content. The articles that are not part of the magazines usually seem to have a PA of 30+ judging by my spot-checks, but even some of those duplicated from magazine articles (and outside of the magazines) have no data available despite being easier to crawl than the magazine content.
-
If adding meta tags, redirects etc to all of the pages is too labor intensive and the return from any SEO goodness those pages is low, then perhaps you could just block search engines access to certain sections of the website via robots.txt file.
-
Given the way Alex describes the separate magazines, I am thinking they wouldn't like having the 301-redirects from a branding perspective. I like the idea of adding an attribution link to the original article. I have doubts about the "noindex" because I think that in many cases Google completely ignores this attribute. I'm not sure that's worth going through all the trouble of doing.
Have you tried putting the "duplicates" back to back in Open Site Explorer? I am really curious to know what that looks like.
-
-
Instead of deleting, you can just noindex + add a link to the original article.
-
Instead of deleting, you can 301 redirect to the original article.
This removes all duplicate content issues.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
May Faceted Navigation via ajax #parameter cause duplicated content issues?
We are going to implement a faceted navigation for an ecommerce site of about 1000 products.
Intermediate & Advanced SEO | | lcourse
Faceted navigation is implemented via ajax/javascript which adds to the URL a large number of #parameters.
Faceted pages are canonicalizing to page without any parameters. We do not want google to index any of the faceted pages at this point. Will google include pages with #parameters in their index?
Can I tell google somehow to ignore #parameters and not to index them?
Could this setup cause any SEO problems for us in terms of crawl bandwidth and or link equity?0 -
About duplicate content
We have to products: - loan for a new car
Intermediate & Advanced SEO | | KBC
- load for a second hand car Except for title tag, meta desc and H1, the content is of course very similmar. Are these pages considered as duplicate content? https://new.kbc.be/product/lenen/voertuig/autolening-tweedehands-auto.html
https://new.kbc.be/product/lenen/voertuig/autolening-nieuwe-auto.html thanks for the advice,0 -
A/B Testing - Should I add product descriptions on my category landing pages as well as on product pages and if so . how to do this to avoid duplicate content
Hi All, I recently relaunched a new design on my tool hire eCommerce website and now display my products in grid form on my category landing pages as opposed to just a list view which we previously had on the old design. My bounce rates are alot higher than they use to be and my gut instinct is telling me maybe this is wrong . I want to do some a/b testing using a list view. My question is , previously in our list views we just showed the images and pricing and had on page content on the bottom of the page. The user would click on the product image and they would then we taken to the product page which has the product description , t&c, etc etc.. If I was to do this in my a/b testing but change it so we also displayed the product descriptions as well on the category landing pages . Is there a special way to do this as in effect, we would have duplicate content as the product descriptions are also on the product page?. Does anyone have any thoughts on this as to whether its a No No from an SEO point of view ?... Heres a short url link to one of my category pages - http://goo.gl/QJv5gw Historically we use to rank well for the category landing pages and not for the product pages.Our Rankings are down , bounce rates are higher so I am trying to sort both. We have good content on pages etc. Any advice greatly appreciated as always thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Duplicate Content / Canonical Conundrum on E-Commerce Website
Hi all, I’m looking for some expert advice on use of canonicals to resolve duplicate content for an e-Commerce site. I’ve used a generic example to explain the problem (I do not really run a candy shop). SCENARIO I run a candy shop website that sells candy dispensers and the candy that goes in them. I sell about 5,000 different models of candy dispensers and 10,000 different types of candy. Much of the candy fits in more than one candy dispenser, and some candy dispensers fit exactly the same types of candy as others. To make things easy for customers who need to fill up their candy dispensers, I provide a “candy finder” tool on my website which takes them through three steps: 1. Pick your candy dispenser brand (e.g. Haribo) 2. Pick your candy dispenser type (e.g. soft candy or hard candy) 3. Pick your candy dispenser model (e.g. S4000-A) RESULT: The customer is then presented with a list of candy products that they can buy. on a URL like this: Candy-shop.com/haribo/soft-candy/S4000-A All of these steps are presented as HTML pages with followable/indexable links. PROBLEM: There is a duplicate content issue with the results pages. This is because a lot of the candy dispensers fit exactly the same candy (e.g. S4000-A, S4000-B and S4000-C). This means that the content on these pages are the basically same because the same candy products are listed. I’ll call these the “duplicate dispensers” E.g. Candy-shop.com/haribo/soft-candy/S4000-A Candy-shop.com/haribo/soft-candy/S4000-B Candy-shop.com/haribo/soft-candy/S4000-C The page titles/headings change based on the dispenser model, but that’s not enough for the pages to be deemed unique by Moz. I want to drive organic traffic searches for the dispenser model candy keywords, but with duplicate content like this I’m guessing this is holding me back from any of these dispenser pages ranking. SOLUTIONS 1. Write unique content for each of the duplicate dispenser pages: Manufacturers add or discontinue about 500 dispenser models each quarter and I don’t have the resources to keep on top of this content. I would also question the real value of this content to a user when it’s pretty obvious what the products on the page are. 2. Pick one duplicate dispenser to act as a rel=canonical and point all its duplicates at it. This doesn’t work as dispensers get discontinued so I run the risk of randomly losing my canonicals or them changing as models become unavailable. 3. Create a single page with all of the duplicate dispensers on, and canonical all of the individual duplicate pages to that page. e.g. Canonical: candy-shop.com/haribo/soft-candy/S4000-Series Duplicates (which all point to canonical): candy-shop.com/haribo/soft-candy/S4000-Series?model=A candy-shop.com/haribo/soft-candy/S4000-Series?model=B candy-shop.com/haribo/soft-candy/S4000-Series?model=C PROPOSED SOLUTION Option 3. Anyone agree/disagree or have any other thoughts on how to solve this problem? Thanks for reading.
Intermediate & Advanced SEO | | webmethod0 -
Duplicate Content for Deep Pages
Hey guys, For deep, deep pages on a website, does duplicate content matter? The pages I'm talk about are image pages associated with products and will never rank in Google which doesn't concern me. What I'm interested to know though is whether the duplicate content would have an overall effect on the site as a whole? Thanks in advance Paul
Intermediate & Advanced SEO | | kevinliao1 -
Duplicate Content Issue
Why do URL with .html or index.php at the end are annoying to the search engine? I heard it can create some duplicate content but I have no idea why? Could someone explain me why is that so? Thank you
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Duplicate content - canonical vs link to original and Flash duplication
Here's the situation for the website in question: The company produces printed publications which go online as a page turning Flash version, and as a separate HTML version. To complicate matters, some of the articles from the publications get added to a separate news section of the website. We want to promote the news section of the site over the publications section. If we were to forget the Flash version completely, would you: a) add a canonical in the publication version pointing to the version in the news section? b) add a link in the footer of the publication version pointing to the version in the news section? c) both of the above? d) something else? What if we add the Flash version into the mix? As Flash still isn't as crawlable as HTML should we noindex them? Is HTML content duplicated in Flash as big an issue as HTML to HTML duplication?
Intermediate & Advanced SEO | | Alex-Harford0 -
Help With Preferred Domain Settings, 301 and Duplicate Content
I've seen some good threads developed on this topic in the Q&A archives, but feel this topic deserves a fresh perspective as many of the discussion were almost 4 years old. My webmaster tools preferred domain setting is currently non www. I didn't set the preferred domain this way, it was like this when I first started using WM tools. However, I have built the majority of my links with the www, which I've always viewed as part of the web address. When I put my site into an SEO Moz campaign it recognized the www version as a subdomain which I thought was strange, but now I realize it's due to the www vs. non www preferred domain distinction. A look at site:mysite.com shows that Google is indexing both the www and non www version of the site. My site appears healthy in terms of traffic, but my sense is that a few technical SEO items are holding me back from a breakthrough. QUESTION to the SEOmoz community: What the hell should I do? Change the preferred domain settings? 301 redirect from non www domain to the www domain? Google suggests this: "Once you've set your preferred domain, you may want to use a 301 redirect to redirect traffic from your non-preferred domain, so that other search engines and visitors know which version you prefer." Any insight would be greatly appreciated.
Intermediate & Advanced SEO | | JSOC1