Confused about robots.txt

Netpace

There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots.

User-agent: *
Disallow: javascript.js
Disallow: /images/
Disallow: /embedconfig
Disallow: /playerconfig
Disallow: /spotlightmedia
Disallow: /EventVideos
Disallow: /playEpisode

Allow: /

Sitemap: http://www.example.tv/sitemapindex.xml
Sitemap: http://www.example.tv/sitemapindex-videos.xml
Sitemap: http://www.example.tv/news-sitemap.xml

Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools!

Help someone, anyone! Can't seem to understand this robotic business!

Regards,

crvw

Google may still index pages excluded by robots.txt if the pages are backlinked either internally or externally.

For best results, use meta noindex to tell search engines they're not allowed to show the link in results, and meta nofollow to tell robots not to follow any links on the page.

Webmaster Tools Help: Using meta tags to block access to your site

You can also explicitly address goooglebot in the meta tag, as opposed to just robots. If you use both a robots.txt and meta robots tags and there are conflicting directives, googlebot will follow the most restrictive one.

irvingw

I would also recommend to go to the site configuration - crawler access page in Google Webmaster and test many of your sites URL's to ensure that robots can access them. Test every unique URL format on your site like the search results page, product pages, category pages, etc... I always use this tool whenever I make any change in the robots.txt

Entrusteddev

Hi,

Allow: / isn't valid syntax in a robots.txt file, Anything that isn't disallowed is allowed by default.

Other than that all looks good. Perhaps the 200 or so links to blocked pages were indexed before the robots.txt was last updated with the disallows?

Regards

Aran

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Confused about robots.txt

Browse Questions

Explore more categories

Related Questions

Using one robots.txt for two websites

Will it be possible to point diff sitemap to same robots.txt file.

2 sitemaps on my robots.txt?

Do I need to block my cart page in robots.txt?

One server, two domains - robots.txt allow for one domain but not other?

Robots.txt on refinements

Meta-robots Nofollow on logins and admins

Do I need robots.txt and meta robots?