Clarification regarding robots.txt protocol
-
Hi,
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice. -
If the pages are already indexed and you want them to be completely removed, you need to allow the crawlers in robots.txt and noindex the individual pages.
So if you just block the site with robots.txt (and I recommend blocking via folders or variables, not individual pages) while the pages are indexed, they will continue to appear in search results but have a meta description of (this page is being blocked by robots.txt). However, it will continue to rank and appear because of the cached data.
If you add the noindex tags to your pages instead, the next time crawlers visit the pages they will see the new tag and remove the page from the search index (meaning it won't show up at all). However, make sure your robots.txt isn't blocking the crawlers from seeing this updated code.
-
There are a few ways to do this.
First, I would use the Google Removal Tool to remove those URLs. More information here: https://support.google.com/webmasters/answer/1663419?hl=en
Then, using the robots.txt file is good, you need to make sure that you're listing the correct URLs or URL path there.
I would make sure that you are using a "410 Gone" in the server header, and not a 404 error. The 410 Gone will get those URLs removed faster.
-
If the target is to get the URLs out of the search engine index than there are the few solutions can work for you:
- The one your mentioned: I think it’s bad to add 1000+ URLs in robots.txt file its make sense for your business.
- Adding meta no-index tag to the pages (if pages physically exist).
Also in order to quickly remove them from the index you can update robots.txt file and then go to GWC and use remove URL feature.
Just a thought!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking in Robots.txt and the re-indexing - DA effects?
I have two good high level DA sites that target the US (.com) and UK (.co.uk). The .com ranks well but is dormant from a commercial aspect - the .co.uk is the commercial focus and gets great traffic. Issue is the .com ranks for brand in the UK - I want the .co.uk to rank for brand in the UK. I can't 301 the .com as it will be used again in the near future. I want to block the .com in Robots.txt with a view to un-block it again when I need it. I don't think the DA would be affected as the links stay and the sites live (just not indexed) so when I unblock it should be fine - HOWEVER - my query is things like organic CTR data that Google records and other factors won't contribute to its value. Has anyone ever blocked and un-blocked and whats the affects pls? All answers greatly received - cheers GB
Technical SEO | | Bush_JSM0 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
'External nofollow' in a robots meta tag? (advertorial links)
I believe this has never worked? It'd be an easy way of preventing any penalties from Google's recent crackdown on paid links via advertorials. When it's not possible to nofollow each external link individually, what are people doing? Nofollowing and/or noindexing the whole page?
Technical SEO | | Alex-Harford0 -
Server crashed - What should I do regarding Google SERP´s?
We have several travel websites in Uruguay since 2003. These sites have a very high PR and Trust. The server where all our sites are hosted has crashed and we have been for 2 days now trying to fix all this mess. We hope this problem will be fixed today. Please I really need to know what should I do regarding Google. I mean one of our sites has been ranking in top 1 positions for more than 150 keywords. Will we loose all that? What can we do about it? It´s the first time this has happened to our sites.
Technical SEO | | ceci27100 -
Quick Seo question regarding 301 redirect
Hi everyone and thank you for showing interested in my problem and for helping me out with this easy thing i have going on Here is how it puts out : I have 2 websites, same niche, mostly same keywords. Site #1 holding strong on google #2 ranking for months now. Site #2 was holding strong in google top 10 rankings until 2 weeks ago when it got sandboxed for some reason I want to use a 301 permanent redirect from Site #2 to Site #1 to pass all the link juice onto Site #1 and hopefully beat the #1 spot The question: Will this affect Site #1 is anyway, considering Site #2 is in somehow sandbox ( i assume that, since he dropped more then 70 positions over night ) Is thins a good think to do or i risk damaging Site #1 by doing this ? Thanks allot in advance. Best regards,
Technical SEO | | caw_ro
Trinca Alexandru0 -
Un-Indexing a Page without robots.txt or access to HEAD
I am in a situation where a page was pushed live (Went live for an hour and then taken down) before it was supposed to go live. Now normally I would utilize the robots.txt or but I do not have access to either and putting a request in will not suffice as it is against protocol with the CMS. So basically I am left to just utilizing the and I cannot seem to find a nice way to play with the SE to get this un-indexed. I know for this instance I could go to GWT and do it but for clients that do not have GWT and for all the other SE's how could I do this? Here is the big question here: What if I have a promotional page that I don't want indexed and am met with these same limitations? Is there anything to do here?
Technical SEO | | DRSearchEngOpt0 -
SeoMoz robot is not able to crawl my website.
Hi, SeoMoz robot crawls only two web pages of my website. I contacts seomoz team and they told me that the problem is because of Javascript use. What is the solution to this? Should I contact my webdesign company and ask them to remove Javascript code?
Technical SEO | | ashish2110