How preproduction website is getting indexed in Google.
-
Hi team,
Can anybody please help me to find how my preproduction website and urls are getting indexed in Google.
-
As Eric hinted, the best method to prevent any pages being indexed would be to use htaccess password protection dialog on your development site. It's fairly easy to implement. You can find instructions to do so here: http://www.htaccesstools.com/articles/password-protection/
-
Hi Anoop! Have everyone's answers helped? Do you still have any questions?
-
Anoop, when a 'development' or 'preproduction' website or subdomain is getting indexed, that means that you haven't stopped the search engines from crawling it. The search engines, especially Google, are very aggressive at crawling, and they will crawl just about any URL that they find. It seems as though all you have to do is visit that page and it's going to get crawled.
Best way to stop Google from crawling (then indexing) a website is to stop it from getting crawled using the robots.txt file. Keep in mind, though, that even if you tell them to stay out of it using the robots.txt file they will still index those URLs.
The only way to stop Google from crawling would be to password protect the website or make it available only on a private server, or available via VPN only.
-
In addition to noindexing the pages using the meta tag, if you have WMT / Search Console set up, you can request Google remove those URLs from their index for the time being. I've found that this may take up to a couple of hours from the removal request to the time of actual removal.
As to how they were found, there's a good chance that Google crawled a link to a preproduction webpage and went from there.
-
Hi
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the section of your page:
To prevent only Google web crawlers from indexing a page:
You should be aware that some search engine web crawlers might interpret the
noindex
directive differently. As a result, it is possible that your page might still appear in results from other search engines.here is complete guide: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1
-
Hi,
Have you noindexed & nofollowed the site and pages? I would also suggest you block all crawlers by disallowing access in the robots.txt file.
Do you know if this has all been done?
-Andy
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can't get Google to index our site although all seems very good
Hi there, I am having issues getting our new site, https://vintners.co indexed by Google although it seems all technical and content requirements are well in place for it. In the past, I had way poorer websites running with very bad setups and performance indexed faster. What's concerning me, among others, is that the crawler of Google comes from time to time when looking on Google Search Console but does not seem to make progress or to even follow any link and the evolution does not seem to do what google says in GSC help. For instance, our sitemap.xml was submitted, for a few days, it seemed like it had an impact as many pages were then visible in the coverage report, showing them as "detected but not yet indexed" and now, they disappeared from the coverage report, it's like if it was not detected any more. Anybody has any advice to speed up or accelerate the indexing of a new website like ours? It's been launched since now almost two months and I was expected, at least on some core keywords, to quickly get indexed.
Technical SEO | | rolandvintners1 -
Get List Of All Indexed Google Pages
I know how to run site:domain.com but I am looking for software that will put these results into a list and return server status (200, 404, etc). Anyone have any tips?
Technical SEO | | InfinityTechnologySolutions0 -
Dev Site Was Indexed By Google
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out? From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything. Any advice is welcome, I just wanted to discuss this before making a decision.
Technical SEO | | ntsupply0 -
Staging site and "live" site have both been indexed by Google
While creating a site we forgot to password protect the staging site while it was being built. Now that the site has been moved to the new domain, it has come to my attention that both the staging site (site.staging.com) and the "live" site (site.com) are both being indexed. What is the best way to solve this problem? I was thinking about adding a 301 redirect from the staging site to the live site via HTACCESS. Any recommendations?
Technical SEO | | melen0 -
Website being crawled but not indexed any thoughts?
Hi Everyone,
Technical SEO | | Ant71
I created a new website a few weeks ago www.drivingseaford.co.uk , did a little link citation, links from Google+, submitted to webmaster tools etc but its still not getting indexed. Webmaster tools crawl stats page is showing pages being crawled, no errors. But 0 indexed. http://www.drivingseaford.co.uk/robots.txt is showing User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Im a bit stumped as never had this before!!! Any ideas from you lovely people?? Antony0 -
Two companies merge: website A redirect 301 to website B. Problems?
Hi, last december the company I work for and another company merged. The website of company A was taken offline and the home page was 302 redirected to a page on website B. This page had information about the merger and the consequences for customers. The deeper pages of website A were 301 redirected to similar pages on website B. After a while, the traffic from the redirected home page decreased and we thought it was time to change the redirect from a 302 into a 301 redirect to the home page. Because there are still a lot of links to the home page of website A and we wanted to preserve the link juice. Two weeks ago we changed the 302 redirect from website A into a 301 redirect to the home page of website B. Last week the Google webmaster tools account of website B showed the links from the 301 redirected website A. The total amount of links doubled and the top anchor text is the name of company A instead of company B. This, off course, could trigger an alarm at Google. Because we got a lot of new links with a different anchor text. A tactic used by spammers/black-hats. I am a bit worried that our change will be penalized by Google. But our change is legit. It is to the advantage of our customers to find us if they search for the name of company A or click on a link to website A. We didn´t change the change of address of domain A in Google webmaster tools yet. Is it a good idea to change the change of address of domain A into domain B? Are there other precautions we can take?
Technical SEO | | NN-online0 -
Should I Do On Site Optimization For A Website That Will Get A New Design
Would it be wise for me to start implementing onsite optimization changes on a website, such as the changing urls, adding in keywords in meta tags, meta descriptions, etc if the website is about to get a totally new design. For example if I wanted to change the url structure and onsite optimization features would the changes still be on the new website.
Technical SEO | | TSpike10 -
De-indexing thin content & Panda--any advantage to immediate de-indexing?
We added the nonidex, follow tag to our site about a week ago on several hundred URLs, and they are still in Google's index. I know de-indexing takes time, but I am wondering if having those URLs in the index will continue to "pandalize" the site. Would it be better to use the URL removal request? Or, should we just wait for the noindex tags to remove the URLs from the index?
Technical SEO | | nicole.healthline0