Duplicate site (disaster recovery) being crawled and creating two indexed search results

OpenTable

I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.

Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.

There seem to be two potential fixes. Which is best for this case?

use the robots.txt to block Google from crawling the .gtm site

2) canonicalize the the gtm urls to toptable.co.uk

In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.

Thanks in advance to the SEOmoz community!

Dr-Pete

It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.

Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.

josh-riley

First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.

OlegKorneitchouk

If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.

Otherwise, I would noindex the backup site.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate site (disaster recovery) being crawled and creating two indexed search results

Browse Questions

Explore more categories

Related Questions

Displaying Vanity URL in Google Search Result

Google Indexing our site

Tools to scan entire site for duplicate content?

What can you do when Google can't decide which of two pages is the better search result

Why is a site no longer being indexed by Google after HTTPS switch?

Aggregating Search Results for Lower Pogo-Sticking?

Duplicate content resulting from js redirect?

Pages un-indexed in my site