My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.
-
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Are there any other precautions I should be taking? Please advise.
-
On your beta sites in future, I would recommend using Basic HTTP Authentication so that spiders can't even access it (this is for Apache):
AuthUserFile /var/www/sites/passwdfile
AuthName "Beta Realm"
AuthType Basic
require valid-user
Then htpasswd -m /var/www/sites/passwdfile usernameIf you do this as well, Google's Removal Tool will go "ok its not there I should remove the page" as well, because they usually ask for content in the page as a check for removal. If you don't remove the text, they MAY not process the removal request (even if it has noindex [though I don't know if that's the case]).
-
-
In Webmaster Tools, set the subdomain up as its own site and verify it
-
Put on the robots.txt for the subdomain (beta.website.com/robots.txt
User-agent: *
Disallow: / -
You can then submit this site for removal in Google Webmaster Tools
- Click "optimization" and then "remove URLs"
- Click "create a new removal request"
- Type the URL "http://beta.website.com/" in there
- Click "continue"
- Click "submit request".
-
-
Agreed on all counts with Mark. In addition, if you haven't done this already, make sure you have canonical tags in place on your pages. Good luck!
-
You can add noindex to the whole subdomain, and then wait for the crawlers to remove it.
Or you can register the subdomain with webmaster tools, block the subdomain via the robots.txt with a general Disallow: / for the entire subdomain, and then use the URL removal tool in Webmaster Tools to remove the subdomain via robots.txt. Just a robots.txt block won't work - it won't remove the pages, it'll just prevent them from being crawled again.
In your case, I would probably go the route of the robots.txt / url removal tool. This will work to remove the pages from Google. Once this has happened, I would use the noindex tag on the whole subdomain and remove the robots.txt block - this way, all search engines should not index the page / will remove it from their index.
Mark
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?
Does anyone know the linking of hashtags on Wix sites does it negatively or positively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please? For example at the bottom of this blog post https://www.poppyandperle.com/post/face-painting-a-global-language the hashtags are linked, but they don't go to a page, they go to search results of all other blogs using that hashtag. Seems a bit of a strange approach to me.
Technical SEO | | Mediaholix0 -
I am looking for best way to block a domain from getting indexed ?
We have a website http://www.example.co.uk/ which leads to another domain (https://online.example.co.uk/) when a user clicks,in this case let us assume it to be Apply now button on my website page. We are getting meta data issues in crawler errors from this (https://online.example.co.uk/) domain as we are not targeting any meta content on this particular domain. So we are looking to block this domain from getting indexed to clear this errors & does this effect SERP's of this domain (**https://online.example.co.uk/) **if we use no index tag on this domain.
Technical SEO | | Prasadgotteti0 -
Anything I'm missing as my page just donst seem to rank
I am wandering if anyone can offer any suggestions, we have a page on our site https://www.wilsonfield.co.uk/insolvency-advice/liquidation/ this page is optimised to rank for liquidation however no matter how many links or how optimised the page is it just will not show in the SERPS. Moz gives it a page score of A we have built relevant links directly to the page using appropriate anchor text, have social likes and concentrated of getting more google+ likes. We run a detailed Moz SERP report comparing the above url to the top 10 ranked pages and we are looking competitive if not better on all ranking factors. This is now really frustrating that we arnt even in the top 100 and cant understand why. we have the https version of the site also submitted to webmaster tools and www is set to be the prefered. Has anyone got any ideas as to why google just dosnt like our site, we have no crawl errors we use all best practices.
Technical SEO | | Wilson_Field0 -
Google's Omitted Results - Attempt to De-Index
We're trying to get webpages from our QA site out of Google's index. We've inserted the NOINDEX tags. Google now shows only 3 results (down from 196,000), however, they offer a link to "show omitted results" at the bottom of the page. (A) Did we do something wrong? or (B) were we successful with our NOINDEX but Google will offer to show omitted results anyway? Please advise! Thanks!
Technical SEO | | BVREID0 -
Inconsistent page titles in SERP's
I encountered a strange phenomenon lately and I’d like to hear if you have any idea what’s causing it. For the past couple of weeks I’ve seen some our Google rankings getting unstable. While looking for a cause, I found that for some pages, Google results display another page title than the actual meta title of the page. Examples http://www.atexopleiding.nl Meta title: Atex cursus opleider met ruim 40 jaar ervaring - Atexopleiding.nl Title in SERP: Atexopleiding.nl: Atex cursus opleider met ruim 40 jaar ervaring http://www.reedbusinessopleidingen.nl/opleidingen/veiligheid/veiligheidskunde Meta title: Opleiding Veiligheidskunde, MBO & HBO - Reed Business Opleidingen Title in SERP: Veiligheidskunde - Reed Business Opleidingen http://www.pbna.com/vca-examens/ Meta title: Behaal uw VCA diploma bij de grootste van Nederland - PBNA Title in SERP: VCA Examens – PBNA I’ve looked in the source code, fetched some pages as Googlebot in WMT, but the title shown in the SERP doesn’t even exist in the source code. Now I suspect this might have something to do with the “cookiewall” implemented on our sites. Here’s why: Cookiewall was implemented end of January The problem didn’t exist until recently, though I can’t pinpoint an exact date. Problem exists on both rbo.nl, atexopleiding.nl & pbna.com, the latter running on Silverstripe CMS instead of WP. This rules out CMS specific causes. The image preview in the SERPS of many pages show the cookie alert overlay However, I’m not able to technically prove that the cookiescript causes this and I’d like to rule out other any obvious causes before I "blame it on the cookies" :). What do you think?
Technical SEO | | RBO0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0 -
For large sites, best practices for pages hidden behind internal search?
If a website has 1M+ pages, with most of them being hidden behind an internal search, what's the best way to get pages included in an engine's index? Does a direct clickpath to those pages need to exist from the homepage or other major hub pages on the site? Is submitting an XML sitemap enough?
Technical SEO | | vlevit0 -
If a page isn't linked to or directly sumitted to a search engine can it get indexed?
Hey Guys, I'm curious if there are ways a page can get indexed even if the page isn't linked to or hasn't been submitted to a search engine. To my knowledge the following page on our website is not linked to and we definitely didn't submit it to Google - but it's currently indexed: <cite>takelessons.com/admin.php/adminJobPosition/corp</cite> Anyone have any ideas as to why or how this could have happened? Hopefully I'm missing something obvious 🙂 Thanks, Jon
Technical SEO | | TakeLessons0