Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags
-
Hi Moz Community,
We have the following robots command that should prevent URLs with tracking parameters being indexed.
Disallow: /*?
We have noticed google has started indexing pages that are using tracking parameters. Example below.
These pages are identified as duplicate content yet have the correct canonical tags:
With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags.
Can anyone shed any light onto the situation?
-
Google's multi-layered multi-algorithm system has come a long way in being able to "figure it all out", yet at the same time, falls far short of always successfully "getting it right".
Robots.txt files are no longer an absolute directive. They're now "just another signal", as are canonical tags, meta robots instructions, and their own Google Webmaster URL Parameters system.
Because of this its critical to be consistent across all signals. If you've got the robots.txt file set to not index pages, but also have inbound links from affiliates, that's a prime example of where inbound link signals can override the robots.txt file's instruction if they're not nofollowed links.
While they technically SHOULD not index them after discovering them off-site (because the destination says "index this other version"), that's part of their confused multilayered system.
I have a question though - from what limited information you've provided, this example is based on a url parameter of ?ec=
When I search Google using site:http://www.oakfurnitureland.co.uk/ inurl:ec
I see only three such pages indexed AND where those pages are "fully" indexed. All the rest (over 1,000 additional URLs), are in the Google system, however every one of those others has a meta description of "A description for this result is not available because of this site's robots.txt - learn more."
What that means is they are NOT fully indexing those pages - there is no worry to be had about duplicate content for those. Google is simply tracking that those URLs exist.
So - is that the only URL parameter you're worried about? If so, it's not a major problem on your site. Except for those few exceptions, Google is doing what you need them to do with those.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are there any downsides to using a canonical tag temporarily?
I'm working on redesigning our website. One of the content types has a main archive page (/success-stories) containing all of the success stories (written by graduates of our program). Because we plan to have success stories for other people (non-graduates), I'm using category hierarchies (/success-stories/graduates and success-stories/nonprofits, for example). It will go one level deeper to organize graduates by graduation year (/success-stories/graduates/%year%). I think this will work out well. However, we won't have non-graduate success stories for a little while, probably at least a few weeks, which means that /success-stories and /.../graduates indices will contain the same content for a while. So my question is this: Will it hurt to use a canonical tag that points to /success-stories/graduates as the authority until the main archive page contains more than just graduates? Or would it be better to use a 302 redirect from /success-stories to /.../graduates until more diverse content is added?
Intermediate & Advanced SEO | | bcaples0 -
Did Google Ignore My Links?
Hello, I'm a little new to SEO, but I recently was featured (around 2 yrs ago) on some MAJOR tech blogs. For some reason however, my links aren't getting picked up for over 2 years - not even in MOZ, or other link checker services. - By now I should have had amazing boost from this natural building, but not sure what happened? This was completely white hat and natural links. The links were after the article was created though, would this effect things? - Please let me know if you have any advice! - Maybe I need to ping these some how or something? - Are these worthless? Thanks so much for your help! Here's some samples of the links that were naturally given to http://VaultFeed.com http://thenextweb.com/microsoft/2013/09/13/microsoft-posts-cringe-worthy-windows-phone-video-ads-mocking-apple/ http://www.theverge.com/2013/9/15/4733176/microsoft-says-pulled-iphone-parody-ads-were-off-the-mark http://www.theregister.co.uk/2013/09/16/microsoft_mocks_apple_in_vids_it_quickly_pulls/ http://www.dailymail.co.uk/sciencetech/article-2420710/Microsoft-forced-delete-cringe-worthy-spoof-videos-mocking-new-range-iPhones.html And a LOT more... Not sure if these links will never be valid, or maybe I'm doing something completely wrong? - Is there any way for Google to recognize these now, and then they'll be seen by MOZ and other sites too? I've done a LOT of searching and there's no definitive advice I've seen for links that were added after the URL was first indexed by Google.
Intermediate & Advanced SEO | | DByers0 -
Pagination & Canonicals
Hi I've been looking at how we paginate our product pages & have a quick question on canonicals. Is this the right way to display.. Or should the canonical point to the main page http://www.key.co.uk/en/key/euro-containers-stacking-containers, so Google doesn't pick up duplicate meta information? Thanks!
Intermediate & Advanced SEO | | BeckyKey0 -
Meta Robot Tag:Index, Follow, Noodp, Noydir
When should "Noodp" and "Noydir" meta robot tag be used? I have hundreds or URLs for real estate listings on my site that simply use "Index", Follow" without using Noodp and Noydir. Should the listing pages use these Noodp and Noydr also? All major landing pages use Index, Follow, Noodp, Noydir. Is this the best setting in terms of ranking and SEO. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Canonical or No-index
Just a quick question really. Say I have a Promotions page where I list all current promotions for a product, and update it regularly to reflect the latest offer codes etc. On top of that I have Offer announcement posts for specific promotions for that product, highlighting very briefly the promotion, but also linking back to the main product promotion page which has a the promotion duplicated. So main page is 1000+ words with half a dozen promotions, the small post might be 200 words, and quickly become irrelevant as it is a limited time news article. Now, I don't want the promotion page indexed (unless it has a larger news story attached to the promotion, but for this purpose presume it is doesn't). Initially the core essence of the post will be duplicated in the main Promotion page, but later as the offer expires it wouldn't be. Therefore would you Rel Canonical or just simply No-index?
Intermediate & Advanced SEO | | TheWebMastercom0 -
How to get content to index faster in Google.....pubsubhubbub?
I'm curious to know what tools others are using to get their content to index faster (other than html sitmap and pingomatic, twitter, etc) Would installing the wordpress pubsubhubbub plugin help even though it uses pingomatic? http://wordpress.org/extend/plugins/pubsubhubbub/
Intermediate & Advanced SEO | | webestate0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Need to duplicate the index for Google in a way that's correct
Usually duplicated content is a brief to fix. I find myself in a little predicament: I have a network of career oriented websites in several countries. the problem is that for each country we use a "master" site that aggregates all ads working as a portal. The smaller nisched sites have some of the same info as the "master" sites since it is relevant for that site. The "master" sites have naturally gained the index for the majority of these ads. So the main issue is how to maintain the ads on the master sites and still make the nische sites content become indexed in a way that doesn't break Google guide lines. I can of course fix this in various ways ranging from iframes(no index though) and bullet listing and small adjustments to the headers and titles on the content on the nisched sites, but it feels like I'm cheating if I'm going down that path. So the question is: Have someone else stumbled upon a similar problem? If so...? How did you fix it.
Intermediate & Advanced SEO | | Gustav-Northclick0