Robots.txt Blocking - Best Practices
-
Hi All,
We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line.
We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files?
Thanks!
User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
-
Thanks for taking the time to respond in depth, GreenStone. We appreciate the advice and have passed your response along to the web hosting company (along with a frustrated email) explaining they're not adhering to anyone's best practices. Hopefully this will convince them!
-
Thanks, Dmitrii for your response! From our research we've seen similar recommendations and it helps to have more evidence to back it up. Hopefully these guys will give in a bit!
-
Completely agree, I really wouldn't want to host my stuff with a company that can't figure out what really the best practices are ;-). This is very well layed out why you shouldn't want to set up your robots.txt like it is right now.
-
In general, I definitely wouldn't recommend the way the web-provider is handling this.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
- It makes a lot more sense to just allow crawlers full access, and then add crawl delays for non google crawlers, in addition to disallowing those specific sub-folders: Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp.
- Googlebot Disallow: Crawl-delay: 5, does not do you any good, as google does not obey these commands. Only Search Console can control this.
- You can test what is visible to googlebot within search console's "robots" subsection, in order to verify what they can access.
- Disallowing all while adding exceptions should never be the norm. Allowing all to crawl while adding exceptions for other crawlers aside from google would be best practice generally,
-
Here is another video from Matt - https://www.youtube.com/watch?v=I2giR-WKUfY
Lots of good points there too.
-
Hi.
Super weird client - that's for sure.
User-agent: * Disallow: /
Every bot will be blocked off! how in the world are they ranking?
watch that video, there are good ideas of bot and crawlers controlling. As well as you can consider that as best practices. And yes, what they have now is ridiculous.
https://mza.seotoolninja.com/community/q/should-we-use-google-s-crawl-delay-setting
Here is a q/a about crawler delays. As far as I know Google ignores delays anyway, plus there is nothing good about it anyway.
Hope this helps.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO Best eCommerce Practice - Same Product Different Keywords
I want to target different keywords for the same e-commerce product. What's the best SEO practice? I'm aware of the pitfalls to keyword stuffing. The product example is the GoPro Hero 5 Action Camera. The same action camera can be used in many different activities, e.g. surfing, auto racing, mountain biking, sky diving, search & rescue, law enforcement etc. These activities target completely different markets, so naturally the keywords are different. I have three strategies to tackle the issue. Please let me know which one you think is best. 1) Create different keyword landing pages with a call-to-action to the same conversion page Each landing page will be optimized for the targeted keywords e.g. surfing, auto racing, mountain biking, sky diving, search & rescue etc. Obviously this will be a big task because there will be numerous landing pages. Each page will show how the product can be used in these activities. For Surfing, the content would include surfing images with the GoPro Hero 5, instructions on how to mount the camera to a surfboard, waterproof tests, surfing testimonials and surfing owner reviews, etc. The call-to-action leads to a generic product conversion page displaying product information such as specs, weight, video formats, price, shipping, warranty etc. The same product page will be the call-to-action for all keyword landing pages. Positives Vast number of targeting long-tail keywords, numerous landing pages Good specific user experience who may be looking for "underwater action camera" (specific mounting instructions related to surfboards etc.) Less duplicate content as there is only one product page showing the same information Negatives Challenging to come up with each page for the vast amount of activities. Inbound Link Considerations
Intermediate & Advanced SEO | | ChrisCK
Inbound links from publications can link directly to the product page or the keyword landing page Surf Magazine may link to:
"Surfing Action Camera | GoPro Hero 5 | GoPro.com" - gopro.com/hero5/underwater-surf-camera
"GoPro Hero 5 Action Camera | GoPro.com" - gopro.com/hero5 2) Create different keyword landing pages with call-to-action to directly add product to cart Similar to the first option, but the call-to-action on the landing page is to Add Hero 5 to Cart. The user experience will be similar, the content creation challenges will be similar, but the techy product info e.g. specs, price, video format, etc. will be displayed on the same landing page. Positives Same benefit to long-tail keywords targeting Same benefit to a good, specific user experience Negatives Same challenges to create each long-tail keyword landing page Since there is no aggregate "product page", inbound links will be split between the landing pages Splitting of Page Authority to each landing conversion page Surf Magazine will link to:
"Surfing Action Camera | GoPro Hero 5 | GoPro.com" - gopro.com/hero5/underwater-surf-camera
Cycling Magazine will link to:
"Cycling Action Camera | GoPro Hero 5 | GoPro.com" - gopro.com/hero5/cycling-camera 3) Create conversion-focused product page with casual blog about keywords This is currently what GoPro has chosen - GoPro Hero 5. The product page displays the many different types of activities on the same page. The page is focused on the user experience with images of the action camera being used in different cool activities, showing its versatility. Note, very little long-tail keyword targeting on this page, instead they could use a broad keyword "action camera". To target long-tails, maybe a blog can be used brand ambassadors displaying the product being used in the various activities. Positives User experience focused Higher conversion rate Less content creation work Inbound links go to the same product page, building Page Authority Negatives Poor ranking with short-tail keyword (GoPro is not even in Top 10 SERP for "action camera") Poor ranking with long-tail keywords, (GoPro doesn't rank for "diving camera, cycling camera, surf camera") For blogging the long-tail keywords, who really converts from landing on a blog of the actual seller?! I hope those three strategies were explained clear enough and have enough of a differentiator. Please let me know what you think!0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
What are the Best SEO Website which you read daily
Hai Moz memebers, Can you pls suggest me some best seo websites that you people read articles everyday a part from MOZ
Intermediate & Advanced SEO | | SEO_GB1 -
Best SEO url woocommerce, what to do?
Hi! Today we have our product categories indexed (by misstake) and for one of our desired keywords, a category have the nr 1 rank. By misstake, we didnt set nofollow, noindex on our categories, just tags, archives etc. We are now migrating to from Ithemes Exchange to Woocommerce and ime looking on improving our SEO urls for the categories. For keyword "Key1" we rank with this url: http://site/product-category/Key1. The seo meta title and description where untouched when we launched the site last spring so it doesnt look so good.. The plan is to stripe out product-category and instead ad some description ( i have a newly written text of 95 words, 519 letters without space with they keyword precent 5 times in a natural way ) to that particular category and have the url as following: http://site/key1 and then have a 301 redirect for the old http://site/product-category/Key1. What do you think of this? What shall i consider? on the right track? Grateful for any help! // Jonas
Intermediate & Advanced SEO | | knubbz0 -
Link Anchor Text - Best Practice?
Moz - Open Site Explorer using the following setup: Tab: Inbound Links
Intermediate & Advanced SEO | | Mark_Ch
Show: "all"
from: "Only Internal" I have run a number of random tests and have noticed the following results in the link anchor text. [No Anchor Text]
company name
website url
Home
etc. What is the best practice and naming convention to be used? Regards Mark0 -
What is Best Way to Scale RCS Content?
SEO has really moved away from the nitty gritty analysis of backlinking factors, link wheels, and the like and has shifted to a more holistic marketing approach. That approach is best described around MOZ as “Real Company S#it”. RCS is a great way to think about what we really do because it is so much more than just SEO or just Social Media. However, our clients and business owners do want to see results and want it quantified in some way. The way most of our clients understand SEO is by ranking high on specific terms or online avenues they have a better possibility of generating traffic/sales/revenue. They understand this more from the light of traditional marketing, where you pay for a TV ad and then measure to see how much revenue that ad generated. In the light of RCS and the need to target a large number of keywords for a given client, how do most PROs handle this situation; where you have a large number of keywords to target but with RCS? Many I’ve asked tend to use the traditional approach of creating a single content piece that is geared towards a given target keyword. However, that approach can get daunting if you have say 25 keywords that a small business wants to target. In this case is not really a case of scaling down the client expectations? What if the client wants all of the keywords and has the budget? Do you just ramp your RCS content creation efforts? It seems that you can do overkill and quickly run out of RCS content to produce.
Intermediate & Advanced SEO | | AaronHenry0 -
Best practices for robotx.txt -- allow one page but not the others?
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed). What is the recommended best practice for this?
Intermediate & Advanced SEO | | nicole.healthline0 -
Our Site's Content on a Third Party Site--Best Practices?
One of our clients wants to use about 200 of our articles on their site, and they're hoping to get some SEO benefit from using this content. I know standard best practices is to canonicalize their pages to our pages, but then they wouldn't get any benefit--since a canonical tag will effectively de-index the content from their site. Our thoughts so far: add a paragraph of original content to our content link to our site as the original source (to help mitigate the risk of our site getting hit by any penalties) What are your thoughts on this? Do you think adding a paragraph of original content will matter much? Do you think our site will be free of penalty since we were the first place to publish the content and there will be a link back to our site? They are really pushing for not using a canonical--so this isn't an option. What would you do?
Intermediate & Advanced SEO | | nicole.healthline1