Robots.txt Blocking - Best Practices
-
Hi All,
We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line.
We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files?
Thanks!
User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
-
Thanks for taking the time to respond in depth, GreenStone. We appreciate the advice and have passed your response along to the web hosting company (along with a frustrated email) explaining they're not adhering to anyone's best practices. Hopefully this will convince them!
-
Thanks, Dmitrii for your response! From our research we've seen similar recommendations and it helps to have more evidence to back it up. Hopefully these guys will give in a bit!
-
Completely agree, I really wouldn't want to host my stuff with a company that can't figure out what really the best practices are ;-). This is very well layed out why you shouldn't want to set up your robots.txt like it is right now.
-
In general, I definitely wouldn't recommend the way the web-provider is handling this.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
- It makes a lot more sense to just allow crawlers full access, and then add crawl delays for non google crawlers, in addition to disallowing those specific sub-folders: Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp.
- Googlebot Disallow: Crawl-delay: 5, does not do you any good, as google does not obey these commands. Only Search Console can control this.
- You can test what is visible to googlebot within search console's "robots" subsection, in order to verify what they can access.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
-
Here is another video from Matt - https://www.youtube.com/watch?v=I2giR-WKUfY
Lots of good points there too.
-
Hi.
Super weird client - that's for sure.
User-agent: * Disallow: /
Every bot will be blocked off! how in the world are they ranking?
watch that video, there are good ideas of bot and crawlers controlling. As well as you can consider that as best practices. And yes, what they have now is ridiculous.
https://mza.seotoolninja.com/community/q/should-we-use-google-s-crawl-delay-setting
Here is a q/a about crawler delays. As far as I know Google ignores delays anyway, plus there is nothing good about it anyway.
Hope this helps.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best SEO Practices for Displaying FAQs throughout site?
I've got an FAQ plugin (Ultimate FAQ) for a Wordpress site with tons of content (like 30 questions each with a page full, multi-paragraphs, of answers with good info -- stuff Google LOVES.) Right now, I have a main FAQ page that has 3 categories and about 10 questions under each category and each question is collapsed by default. You click an arrow to expand it to reveal the answer.I then have a single category's questions also displayed at the bottom of an appropriate related page. So the questions appear in two places on the site, always collapsed by default.Each question has a permalink that links to an individual page with only that question and answer.I know Google discounts (doesn't ignore) content that is hidden by default and requires a click (via js function) to reveal it.So what I'm wondering is if the way I have it setup is optimal for SEO? How is Google going to handle the questions being in essentially three places: it's own standalone page, in a list on a category page, and in a list on a page showing all questions for all categories. Should I make the questions not collapsed by default (which will make the master FAQ page SUPER long!)Does Google not mind the duplicate content within the site?What's the best strategy?
Intermediate & Advanced SEO | | SeoJaz0 -
Is it good practice of keeping all our pages at second level?
While defining the site structure we thought of having all pages at second level only. i.e. domain.com/services domain.com/city domain.com/services-in-city please let us know the pros and cons of having this as architecture.
Intermediate & Advanced SEO | | fabogo_marketing0 -
Best tips for getting a video page to rank?
We have a video for our company, located here: http://www.imageworkscreative.com/imageworks-creative-video It's an overview of our company and the services we offer. We'd like to get this page ranking, but we haven't had much luck so far. Our Youtube account does better, but I'm looking for some things we can do on or offsite to get this page to rank. Any tips would be appreciated!
Intermediate & Advanced SEO | | ScottImageWorks0 -
Huge increase in server errors and robots.txt
Hi Moz community! Wondering if someone can help? One of my clients (online fashion retailer) has been receiving huge increase in server errors (500's and 503's) over the last 6 weeks and it has got to the point where people cannot access the site because of server errors. The client has recently changed hosting companies to deal with this, and they have just told us they removed the DNS records once the name servers were changed, and they have now fixed this and are waiting for the name servers to propagate again. These errors also correlate with a huge decrease in pages blocked by robots.txt file, which makes me think someone has perhaps changed this and not told anyone... Anyone have any ideas here? It would be greatly appreciated! 🙂 I've been chasing this up with the dev agency and the hosting company for weeks, to no avail. Massive thanks in advance 🙂
Intermediate & Advanced SEO | | labelPR0 -
Best Format for URLs on large Ecommerce Site?
I saw this article, http://www.distilled.net/blog/seo/common-ecommerce-technical-seo-problems/, and noticed that Geoff mentioned that product URLs format should be in one of the following ways: Product Page: site.com/product-name Product Page: site.com/category/sub-category/product-name However, for SEO, is there a preferred way? I understand that the top one may be better to prevent duplicate page issues, but I would imagine that the bottom would be better for conversion (maybe the user backtracks to site.com/category/sub-category/ to see other products that he may be interested in). Also, I'd imagine that the top URL would not be a great way to distribute link juice since everything would be attached to the root, right?
Intermediate & Advanced SEO | | eTundra0 -
How best to handle (legitimate) duplicate content?
Hi everyone, appreciate any thoughts on this. (bit long, sorry) Am working on 3 sites selling the same thing...main difference between each site is physical location/target market area (think North, South, West as an example) Now, say these 3 sites all sell Blue Widgets, and thus all on-page optimisation has been done for this keyword. These 3 sites are now effectively duplicates of each other - well the Blue Widgets page is at least, and whist there are no 'errors' in Webmaster Tools am pretty sure they ought to be ranking better than they are (good PA, DA, mR etc) Sites share the same template/look and feel too AND are accessed via same IP - just for good measure 🙂 So - to questions/thoughts. 1 - Is it enough to try and get creative with on-page changes to try and 'de-dupe' them? Kinda tricky with Blue Widgets example - how many ways can you say that? I could focus on geographical element a bit more, but would like to rank well for Blue Widgets generally. 2 - I could, i guess, no-index, no-follow, blue widgets page on 2 of the sites, seems a bit drastic though. (or robots.txt them) 3 - I could even link (via internal navigation) sites 2 and 3 to site 1 Blue Widgets page and thus make 2 blue widget pages redundant? 4 - Is there anything HTML coding wise i could do to pull in Site 1 content to sites 2 and 3, without cloaking or anything nasty like that? I think 1- is first thing to do. Anything else? Many thanks.
Intermediate & Advanced SEO | | Capote0 -
Best practice to change the URL of all my site pages
Hi, I need to change all my site pages URL as a result of moving the site into another CMS platform that has its own URL structure: Currently the site is highly ranked for all relevant KWs I am targeting. All pages have backlinks Content and meta data should remain exactly the same. The domain should stay the same The plan is as follow: Set up the new site using a temporary domain name Copy over all content and meta data Set up all redirects (301) Update the domain name and point the live domain to the new one Watch closely for 404 errors and add any missing redirects Questions: Any comments on the plan? Is there a way (the above plan or any other) to make sure ranking will not be hurt What entries should I add to the sitemap.xml: new pages only or new pages and the pages from the old site? Thanks, Guy.
Intermediate & Advanced SEO | | jid1 -
Best Service for optimizing google product feed?
We're looking for a company that can help us optimize our google product feed. Does anyone have any recommendations or suggestions? Thanks!
Intermediate & Advanced SEO | | eric_since1910.com0