Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
-
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing?
I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution.
Thanks in advance, Luke
-
Hi there,
Regarding the X-Robots tag. We have had a couple of sites that were disallowed in the robots.txt have their PDF, Doc etc files get indexed. I understand the reasoning for this. I would like to remove the disallow in the robots.txt and use the X-robots tag to noindex all pages as well as PDF, Doc files etc. This is for a ngnix configuation. Does anyone know what the written x-robots tag would look like in this case?
-
Test for what works for your site.
Use tools below
- https://www.deepcrawl.com/ (will give you one free full crawl)
- https://www.screamingfrog.co.uk/seo-spider/ (free up to 500 URLs)
- http://urlprofiler.com/ (14 days free try)
- https://www.deepcrawl.com/blog/best-practice/noindex-disallow-nofollow/
- https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#robots-txt
- https://www.deepcrawl.com/blog/best-practice/noindex-and-google/
So much info
https://www.deepcrawl.com/blog/tag/robots-txt/
Thomas
-
Hi Luke,
In order to exclude individual pages from search engine indices, the noindex meta tag
is actually superior to robots.txt.
But X-Robots-Tag header tag is the best but much hader to use.
Block all web crawlers from all content
User-agent: * Disallow: /
Using the
robots.txt
file, you can tell a spider where it cannot go on your site. You can not tell a search engine which URLs it cannot show in the search results. This means that not allowing a search engine to crawl an URL – called “blocking” it – does not mean that URL will not show up in the search results. If the search engine finds enough links to that URL, it will include it; it will just not know what’s on that page.If you want to reliably block a page from showing up in the search results, you need to use a meta robots
noindex
tag. That means the search engine has to be able to index that page and find thenoindex
tag, so the page should not be blocked byrobots.txt
a
robots.txt
file does. In a nutshell, what it does is tell search engines to not crawl a particular page, file or directory of your website.Using this, helps both you and search engines such as Google. By not providing access to certain, unimportant areas of your website, you can save on your crawl budget and reduce load on your server.
Please note that using the
robots.txt
file to hide your entire website for search engines is definitely not recommended.see big photo: http://i.imgur.com/MM7hM4g.png
_(…)_ _(…)_
The robots meta tag in the above example instructs all search engines not to show the page in search results. The value of the
name
attribute (robots
) specifies that the directive applies to all crawlers. To address a specific crawler, replace therobots
value of thename
attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user-agents (a crawler uses its user-agent to request a page.) Google's standard web crawler has the user-agent name.Googlebot
To prevent only Googlebot from crawling your page, update the tag as follows:This tag now instructs Google (but no other search engines) not to show this page in its web search results. Both the and
name
the attributescontent
are non-case sensitive.Search engines may have different crawlers for different properties or purposes. See the complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:
If you need to specify multiple crawlers individually, it's okay to use multiple robots meta tags:
If competing directives are encountered by our crawlers we will use the most restrictive directive we find.
irective. This basically means that if you want to really hide something from the search engines, and thus from people using search,
robots.txt
won’t suffice.Indexer directives
Indexer directives are directives that are set on a per page and/or per element basis. Up until July 2007, there were two directives: the microformat rel=”nofollow”, which means that that link should not pass authority / PageRank, and the Meta Robots tag.
With the Meta Robots tag, you can really prevent search engines from showing pages you want to keep out of the search results. The same result can be achieved with the X-Robots-Tag HTTP header. As described earlier, the X-Robots-Tag gives you more flexibility by also allowing you to control how specific file(types) are indexed.
Example uses of the X-Robots-Tag
Using the
X-Robots-Tag
HTTP headerThe
X-Robots-Tag
can be used as an element of the HTTP header response for a given URL. Any directive that can be used in an robots meta tag can also be specified as anX-Robots-Tag
. Here's an example of an HTTP response with anX-Robots-Tag
instructing crawlers not to index a page:HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: noindex** _(…)_
Multiple
X-Robots-Tag
headers can be combined within the HTTP response, or you can specify a comma-separated list of directives. Here's an example of an HTTP header response which has anoarchive
X-Robots-Tag
combined with anunavailable_after
X-Robots-Tag
.HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: noarchive X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST** _(…)_
The
X-Robots-Tag
may optionally specify a user-agent before the directives. For instance, the following set ofX-Robots-Tag
HTTP headers can be used to conditionally allow showing of a page in search results for different search engines:HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: googlebot: nofollow X-Robots-Tag: otherbot: noindex, nofollow** _(…)_
Directives specified without a user-agent are valid for all crawlers. The section below demonstrates how to handle combined directives. Both the name and the specified values are not case sensitive.
- https://mza.seotoolninja.com/learn/seo/robotstxt
- https://yoast.com/ultimate-guide-robots-txt/
- https://mza.seotoolninja.com/blog/the-wonderful-world-of-seo-metatags
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
- https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
I hope this helps,
Tom
-
If you've recently added the "noindex" meta, get the page fetched in GWT. Google can't act if it doesn't see the tag.
-
Hi Luke,
It's a pretty common misconception that the robots.txt will prevent indexing. It's only purpose is actually to prevent crawling, anything disallowed in there is still up for indexing if it's linked to elsewhere. If you want something deindexed, your best bet is the robots meta tag, but make sure you allow crawling of the URLs to give search engine bots an opportunity to see the tag.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google View "SRC", "HREF", TITLE and Alt tags as Duplicate Content on Home Page Slider?
Greetings MOZ Community. A keyword matrix was developed by my SEO firm. I am in the process of integrating primary, secondary and terciary phrases into the text and am also sprinkling three or four other terms. Using a keyword density tool (http://www.webconfs.com/keyword-density-checker.php) the results were somewhat unexpected after I optimized. So I then looked at the source code and noticed text from HREF, ALT and SRC tags that may be effecting how Google would interpret text on the page. Our home page (www.nyc-officespace-leader.com) contains a slider with commercial real estate listings. Would Google index the SRC, HREF, TITLE and ALT tags in these slider items? Would this be detrimental to SEO? The code for one listing (and there are 7-8 in the slider) looks like this: | href="http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf" title="Lease a Prestigious Fifth Avenue Office - Manhattan, New York">Class A Fifth Avenue Offices class="blockLeft"><a< p=""></a<> href="http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf" title="Lease a Prestigious Fifth Avenue Office - Manhattan, New York"> src="http://dr0nu3l9a17ym.cloudfront.net/wp-content/uploads/fsrep/houses/125x100/305.jpg" alt="Lease a Prestigious Fifth Avenue Office - Manhattan, New York" width="125" height="94" /> 1,340 Sq. Ft. $5,918 / month Fifth Avenue Midtown / Grand Central <a< p=""></a<> | Could the repetition of the title text ("lease a Prestigious Fifth...") trigger a duplicate content penalty? Should the slider content be blocked or set to no-index by some kind of a Java script? We have worked very hard to optimize the home page so it would be a real shame if through some technical oversight we got hit by a Google Panda penalty. Thanks, Alan Thanks
Intermediate & Advanced SEO | | Kingalan10 -
Using folder blocked by robots.txt before uploaded to indexed folder - is that OK?
I have a folder "testing" within my domain which is a folder added to the robots.txt. My web developers use that folder "testing" when we are creating new content before uploading to an indexed folder. So the content is uploaded to the "testing" folder at first (which is blocked by robots.txt) and later uploaded to an indexed folder, yet permanently keeping the content in the "testing" folder. Actually, my entire website's content is located within the "testing" - so same URL structure for all pages as indexed pages, except it starts with the "testing/" folder. Question: even though the "testing" folder will not be indexed by search engines, is there a chance search engines notice that the content is at first uploaded to the "testing" folder and therefore the indexed folder is not guaranteed to get the content credit, since search engines see the content in the "testing" folder, despite the "testing" folder being blocked by robots.txt? Would it be better that I password protecting this "testing" folder? Thx
Intermediate & Advanced SEO | | khi50 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Why the archive sub pages are still indexed by Google?
Why the archive sub pages are still indexed by Google? I am using the WordPress SEO by Yoast, and selected the needed option to get these pages no-index in order to avoid the duplicate content.
Intermediate & Advanced SEO | | MichaelNewman1 -
Why google index some meta titles I dont have?
Hi there, I have a problem with a website and I am desperate to find a solution because I have tried many things and nothing works! My website its: adtriboo.com Google does not find my main URL (main countro spain) www.adtriboo.com/es and I dont see this page its indexed in google. See link https://www.google.es/search?num=100&hl=es&site=&source=hp&q=site%3Aadtriboo.com&oq=site%3Aadtriboo.com&gs_l=hp.3...1189.4419.0.4586.17.17.0.0.0.0.223.1457.9j6j1.16.0...0.0...1c.1.8.hp.brTKX-zPwVI Also, google its showing some meta titles that are not in my page! For example my subfolder for the country Chile shows this title: Chile - Adtriboo but this its my real title Diseño logo, logotipos, video corporativo - adtriboo In webmaster tools everything looks good, and if I explore the webpage like google in webmaster tools the code its ok and everything lookd okay. If you see for example the URL from Chile (www.adtriboo.com/es_CL) the meta title is not the right one! Also i have a problem indexatión because i am not visible for any of my keywords even in the page 10! Please, somebody knows what happen?
Intermediate & Advanced SEO | | Comunicare0 -
Does Google index more than three levels down if the XML sitemap is submitted via Google webmaster Tools?
We are building a very big ecommerce site. The site has 1000 products and has many categories/levels. The site is still in construccion so you cannot see it online. My objective is to get Google to rank the products (level 5) Here is an example level 1 - Homepage - http://vulcano.moldear.com.ar/ Level 2 - http://vulcano.moldear.com.ar/piscinas/ Level 3 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/ Level 4 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes.html/ Level 5 - Product is on this level - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes/autocebante-recomendada-para-filtros-vc-10.html Thanks
Intermediate & Advanced SEO | | Carla_Dawson0 -
Site Indexed by Google but not Bing or Yahoo
Hi, I have a site that is indexed (and ranking very well) in Google, but when I do a "site:www.domain.com" search in Bing and Yahoo it is not showing up. The team that purchased the domain a while back has no idea if it was indexed by Bing or Yahoo at the time of purchase. Just wondering if there is anything that might be preventing it from being indexed? Also, Im going to submit an index request, are there any other things I can do to get it picked up?
Intermediate & Advanced SEO | | dbfrench0