Strange duplicate content issue
-
Hi there,
SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve.
For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career.
The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS.
The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?
-
it could be any one out of the following 3 scenarios.
1: The page in question was moved at some point and since the CMS still accepts the old URL, when google re-visits the old URL it still finds it. So in this scenario it will find both the old URL and the new URL and index both.
2: google hasn't revisited the page for a long while but it is still in it's index, even though it would get a 301 by the CMS when it visits the page. Can be easily fixed by going to webmaster tools and ask it to remove it from the index.
3: there are still links to the old URL either on site or off site and since the CMS doesn't 301 the oid page it will index it again with a new URL.
4:the page still exists in the CMS because of some strange setting or equivalent in the CMS.
as mentioned before the easy fix is to use a robots.txt and deny access to the page and ask google to remove it from it's index. the better fix is to find the problem in the CMS and solve it. a midway fix could be to 301 it in the .htaccess or equvilent on an ISS server.
hope it helped
-
Thanks René,
I updated my earlier reply with a question that i think you missed.
The list isn't growing, which is a good thing but how is it possible for the crawler to pick up the duplicate page urls when the the referrer page has the correct urls?
-
I have come across this sort of issue a gazillion times + infinite.. almost all of our clients seem to have dub cont problems of one kind or another
but often it is different things that is the problem. But I'm afraid that I can't point you in the right direction, since I have no experience with your CMS. To be able to do that I would need to have access to the site itself. (since I don't know the CMS.) My advice would be to get a developer on the issue or to grab hold of the support for the CMS (if any.)
-
Hi René,
Thanks for your reply and suggestions. It could well be CMS remembering old urls as this list isn't growing. But is the crawler able to pickup the old urls when the referrer page has the correct urls?
We are on Expression Engine. Have you come across this sort of issue before?
-
Well it kinda have to be in the CMS, since it has 2 different paths.. But you could fix it by going to the .htaccess (if you have access and redirect it to the right URL and make a robots.txt and disallow access to the page.
if the page has been moved to a new location theres a good chance that the CMS is setup to remember the old URL and show the page. This is indeed a problem, but a potential problem with the CMS.
Go to webmaster tools and ask them to delete the dublicate from thier index.
You specific problem could originate from a ton of different problems and it is kinda har to fix without direct access to everything. What CMS is it your using?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Avoiding duplicate content on internal pages
Lets say I'm working on a decorators website and they offer a list of residential and commercial services, some of which fall into both categories. For example "Internal Decorating" would have a page under both Residential and Commercial, and probably even a 3rd general category of Services too. The content inside the multiple instances of a given page (i.e. Internal Decorating) at best is going to be very similar if not identical in some instances. I'm just a bit concerned that having 3 "Internal Decorating" pages could be detrimental to the website's overall SEO?
Technical SEO | | jasonwdexter0 -
Duplicate content - font size and themes
Hi, How do we sort duplicate content issues with: http://www.ourwebsite.co.uk/ being the same as http://www.ourwebsite.co.uk/StyleType=SmallFont&StyleClass=FontSize or http://www.ourwebsite.co.uk/?StyleType=LargeFont&StyleClass=FontSize and http://www.ourwebsite.co.uk/legal_notices.aspx being the same as http://www.ourwebsite.co.uk/legal_notices.aspx?theme=default
Technical SEO | | Houses0 -
Duplicate Content
The crawl shows a lot of duplicate content on my site. Most of the urls its showing are categories and tags (wordpress). so what does this mean exactly? categories is too much like other categories? And how do i go about fixing this the best way. thanks
Technical SEO | | vansy0 -
How can something be duplicate content of itself?
Just got the new crawl report, and I have a recurring issue that comes back around every month or so, which is that a bunch of pages are reported as duplicate content for themselves. Literally the same URL: http://awesomewidgetworld.com/promotions.shtml is reporting that http://awesomewidgetworld.com/promotions.shtml is both a duplicate title, and duplicate content. Well, I would hope so! It's the same URL! Is this a crawl error? Is it a site error? Has anyone seen this before? Do I need to give more information? P.S. awesomewidgetworld is not the actual site name.
Technical SEO | | BetAmerica0 -
How to prevent duplicate content in archives?
My news site has a number of excerpts in the form of archives based on categories that is causing duplicate content problems. Here's an example with the nutrition archive. The articles here are already posts, so it creates the duplicate content. Should I nofollow/noindex this category page along with the rest and 2011,2012 archives etc (see archives here)? Thanks so much for any input!
Technical SEO | | naturalsociety0 -
Duplicate content issue index.html vs non index.html
Hi I have an issue. In my client's profile, I found that the "index.html" are mostly authoritative than non "index.html", and I found that www. version is more authoritative than non www. The problem is that I find the opposite situation where non "index.html" are more authoritative than "index.html" or non www more authoritative than www. My logic would tell me to still redirect the non"index.html" to "index.html". Am I right? and in the case I find the opposite happening, does it matter if I still redirect the non"index.html" to "index.html"? The same question for www vs non www versions? Thank you
Technical SEO | | Ideas-Money-Art0 -
Duplicate content
I have just ran a report in seomoz on my domain and has noticed that there are duplicate content issues, the issues are: www.domainname/directory-name/ www.domainname/directory-name/index.php All my internal links and external links point to the first domain, as i prefer this style as it looks clear & concise, however doing this has created duplicate content as within the site itself i have an index.php page inside this /directory-name/ to show the page. Could anyone give me some advice on what i should do please? Kind Regards
Technical SEO | | Paul780 -
Duplicate content and tags
Hi, I have a blog on posterous that I'm trying to rank. SEOMoz tells me that I have duplicate content pretty much everywhere (4 articles written, 6 errors at the last crawl). The problem is that I tag my posts, and apparently SEOMoz thinks that it's duplicate content only because I don't have so many posts, so pages end up being very very similar. What can I do in these situations ?
Technical SEO | | ngw0