Robots.txt - Googlebot - Allow... what's it for?

McTaggart

Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke

User-Agent: Googlebot

Allow: /.js

Allow: /.css

McTaggart

Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke

CleverPhD

Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.

TomRayner

Hi Luke

As you have correctly assumed, that particular robots command would be pointless.

The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.

So, for example, if you had a rule that blocked pages within a sub-directory, with:

Disallow: /example/*

You could create an allow rule that indexes a specific page within that directory to be indexed, like:

Allow: /example/page.html

Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.

But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.

TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.

Hope this helps.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt - Googlebot - Allow... what's it for?

Browse Questions

Explore more categories

Related Questions

What's the best way of crawling my entire site to get a list of NoFollow links?

Blacklisted website no longer blacklisted, but will not appear on Google's search engine.

Block in robots.txt instead of using canonical?

Robot.txt help

Removing Dynamic "noindex" URL's from Index

Will blocking google and SE's from indexing images hurt SEO?

301 Redirect All Url's - WWW -> HTTP

Culling 99% of a website's pages. Will this cause irreparable damage?