Meta robots at every page rather than using robots.txt for blocking crawlers? How they'll get indexed if we block crawlers?

vtmoz

Hi all,

The suggestion to use meta robots tag rather than robots.txt file is to make sure the pages do not get indexed if their hyperlinks are available anywhere on the internet. I don't understand how the pages will be indexed if the entire site is blocked? Even though there are page links are available, will Google really index those pages? One of our site got blocked from robots file but internal links are available on internet for years which are not been indexed. So technically robots.txt file is quite enough right? Please clarify and guide me if I'm wrong.

Thanks

ThompsonPaul

I agree with Gaston's approach right up to step 4. If you add the no-indexed pages back into a block in the robots.txt file, you'll end up back where you started from. Because Google will still discover the no-indexed URLs elsewhere and the robots,txt block will stop them from discovering the no-index, and the URLs will likely start to get added to the index again.

No-indexed URLs must not be blocked in robots.txt. Those two processes are mutually exclusive.

Gaston Riera

Hi there,

TLDR; The solution to deindexing and never index again:

Allow (with robots.txt) the web to be crawable
Aplly meta robots tag: noindex,follow
Wait somte weeks to be completely deindexed
block the entire site/section with robots.txt

Robots.txt and the robots meta tag can make the same effect, but to understand them must be analyzed separatedly.

Robots.txt, here you just tell bots where they can go BEFORE they crawl any of the website. This is just a signal, not a directive... Because robots can choose to ignore the what's in the file. Here you can block from the entire web, to an entire section or just specific pages. More info: Robots.txt official page and a really cool and complete guide to robots.txt
Robots meta tag, with it you have more signals to tell, the most used are: noindex, nofollow and follow, due to the usual issues about indexing. More info: Robots.txt offical page, Google developers, Meta Robots directive - Moz and a complete guide to meta robots tag - YOAST.

Hope this is what you wanted.
Best luck
GR.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Meta robots at every page rather than using robots.txt for blocking crawlers? How they'll get indexed if we block crawlers?

Browse Questions

Explore more categories

Related Questions

Tens of duplicate homepages indexed and blocked later: How to remove from Google cache?

Meta descriptions

Linking from high ranking sub domain pages to less ranking main domain pages to benefit latter

Google cant read my robots.txt from past 10 days

Sudden drop in rankings and indexed pages!

Google doesnt index my Google+ Profile

How does SEOmoz help in increasing my page rank on google?

Why google index ip address instead of the domain name？