Duplicate content warning: Same page but different urls???
-
Hi guys i have a friend of mine who has a site i noticed once tested with moz that there are 80 duplicate content warnings, for instance
Page 1 is http://yourdigitalfile.com/signing-documents.html
the warning page is http://www.yourdigitalfile.com/signing-documents.html
another example
Page 1 http://www.yourdigitalfile.com/
same second page http://yourdigitalfile.com
i noticed that the whole website is like the nealry every page has another version in a different url?, any ideas why they dev would do this, also the pages that have received the warnings are not redirected to the newer pages you can go to either one???
thanks very much
-
Thanks Tim. Do you have any examples of what those problems might be? With such a large catalog managing those rel canonical tags will be difficult (I don't even know if the store allows them, it's a hosted store solution and little code customization is allowed).
-
Hi there AspenFasteners, in this instance rather than a .HTAccess rule I would suggest applying a rel canonical tag which points to the page you deem as the original master source.
Using the robots to try and hide things could potentially cause you more issues as your categories may struggle to be indexed correctly.
-
We have a similar problem, but much more complex to handle as we have a massive catalog of 80,000 products and growing.
The problem occurs legitimately because our catalog is so large that we offer different navigation paths to the same content.
http://www.aspenfasteners.com/Self-Tapping-Sheet-Metal-s/8314.htm
http://www.aspenfasteners.com/Self-Tapping-Sheet-Metal-s/8315.htm
(If you look at the "You are here" breadcrumb trail, you will see the subtle differences in the navigation paths, with 8314.htm, the user went through Home > Screws, with 8315.htm, via Home > Security Fasteners > Screws).
Our hosted web store does not offer us htaccess, so I am thinking of excluding the redundant navigation points via robots.txt.
My question: is there any reason NOT to do this?
-
Oh ok
The only reason i was thinking it is duplicate content is the warnings i got on the moz crawl, see below.
75 Duplicate Page Content
6 4xx Client Error
5 Duplicate Page Title
44 Missing Meta Description Tag
5 Title Element is Too Short
I have found over 80 typos, grammatical errors, punctuation errors and incorrect information which was leading me to believe the quality of the work and their attention to detail was rather bad, which is why i thought this was a possibility.
Thanks again for your time its really appreciated
-
I wouldn't say that they have created two pages, it is just that because you have two versions of the domain and not set a preferred version that you are getting it indexing twice. .HTaccess changes are under the hood of the website and could have simply been an oversight.
-
Hey Tim
Thanks for your answer. It's really weird, other than lazyness on the devs part not to remove old or previous versions of pages?, have you any idea why they would create multiple versions of the same page with different url's?? is there any legit reason like ones severs mobile or something??
Just wondering thanks for replying
-
OK, so in this instance the only issue you have is that you need to choose your preferred start point - www or non www.
I would add a bit of code to your htaccess file to point to your preferred choice. I personally prefer a www. domain. Something like the below would work.
RewriteCond %{HTTP_HOST} ^example.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]As your site is already indexed I would also for the time being and as more of a safety measure add canonicals to the pages that point to the www. version of your site.
Also if you have a Google Search Console account, you can select your prefered domain prefix in there. this will again help with your indexation.
Hopefully I have covered most things.
Cheers
Tim
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages mirrored on unknown websites (not just content, all the HTML)... blackhat I've never seen before.
Someone more expert than me could help... I am not a pro, just doing research on a website... Google Search Console shows many backlinks in pages under unknown domains... this pages are mirroring the pages of the linked website... clicking on a link on the mirror page leads to a spam page with link spam... The homepage of these unknown domain appear just fine... looks like that the domain is partially hijacked... WTF?! Have you ever seen something likes this? Can it be an outcome of a previous blackhat activity?
White Hat / Black Hat SEO | | 2mlab0 -
G.A. question - removing a specific page's data from total site's results?
I hope I can explain this clearly, hang in there! One of the clients of the law firm I work for does some SEO work for the firm and one thing he has been doing is googling a certain keyword over and over again to trick google's auto fill into using that keyword. When he runs his program he generates around 500 hits to one of our attorney's bio pages. This happens once or twice a week, and since I don't consider them real organic traffic it has been really messing up my GA reports. Is there a way to block that landing page from my overall reports? Or is there a better way to deal with the skewed data? Any help or advice is appreciated, I am still so new to SEO I feel like a lot of my questions are obvious, but please go easy on me!
White Hat / Black Hat SEO | | MyOwnSEO0 -
Will blank category pages automatically get updated
Hello, We've got old category pages that are blank like domain/shoes.html (blank white page not in menu anymore) domain/newshoesurl.html (working URL with link in menu) Will the blank pages be automatically deindexed and updated by Google?
White Hat / Black Hat SEO | | BobGW0 -
Content ideas?
We run a printing company and we are struggling to come up with unique content people will actually want to know, is there any way of getting the ball rolling? We were thinking of ideas such as exhibition guide but this seems to have been overdone. Any help would be appreciated.
White Hat / Black Hat SEO | | BobAnderson0 -
Why do websites use different URLS for mobile and desktop
Although Google and Bing have recommended that the same URL be used for serving desktop and mobile websites, portals like airbnb are using different URLS to serve mobile and web users. Does anyone know why this is being done even though it is not GOOD for SEO?
White Hat / Black Hat SEO | | razasaeed0 -
Should I redirect old pages
I have taken over SEO on a site. The old people built thousands of pages with duplicate content. I am converting site to wordpress and was wondering if I should take the time to 301 redirect all 10,000 or so pages with duplicate content. The 10,000 pages all have links back to different to different pages as well as to the homepage. Should I just let them go to a 404 page not found.
White Hat / Black Hat SEO | | Roots70 -
Using Programmatic Content
My company has been approached a number of times by computer generated content providers (like Narrative Science and Comtex). They are providing computer generated content to a number of big name websites. Does anyone have any experience working with companies like this? We were burned by the first panda update because we were busing boilerplate forms for content
White Hat / Black Hat SEO | | SuperMikeLewis0 -
Is pulling automated news feeds on my home page a bad thing?
I am in charge of a portal that relies on third-party content for its news feeds. the third-party in this case is a renowned news agency in the united kingdom. After the panda and penguin updates, will these feeds end up hurting my search engine rankings? FYI: these feeds occupy only 20 percent of content on my domain. The rest of the content is original.
White Hat / Black Hat SEO | | amit20760